Objective

The purpose of this notebook is to develop a supervised Machine Learning model to that can predict which employees is likely to leave the company. Through predictive modeling and feature engineering, the ultimate goal of the HR department is to understand what factors lead to employees’ departure and to reduce the rate of attrition. The original problem statement required a logistic regression model, and therfore I will first focus on this algorithm. Then we will explore other models and compare their performances. The dataset was cleaned in the notebook Data Wrangling Notebook and was visualized in the Data Exploration Notebook.

I. Load the data

  1. Import relevant packages.
  2. Load the cleaned dataset from Data Wrangling notebook.

II. Machine Learning Models

  1. About Metrics
  2. Which metric is for us?
  3. Global Functions for performance measures.
  4. Prepare the data for modeling
    • Partition the data into train and test sets.
    • Scale Numerical Features
A. Logistic Regression
  1. Baseline Model
  2. Model Tuning
  3. Cross Validation
B. Random Forest
  1. Baseline Model
  2. Model Tuning and Cross Validation
C. Neural Network
  1. Baseline Model
  2. Model Tuning andCross Validation

I. Load and prepare data

1. Import relevant packages.
library(ggplot2)
library(repr)
library(caret)
## Loading required package: lattice
library(ROCR)
library(pROC)
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble  3.0.1     ✓ dplyr   1.0.0
## ✓ tidyr   1.0.3     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ✓ purrr   0.3.4
## ── Conflicts ────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x purrr::lift()   masks caret::lift()
library(magrittr) 
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(nnet)

options(repr.plot.width=4, repr.plot.height=4) # Set the initial plot area dimensions
2. Load the data prepared in the Data Wrangling.
dt <- read.csv("FinalData.csv")

# Remove the first column created by exporting/loading the csv file.
dt %<>% select(-1)

# Convert ordered categorical columns to factors.
dt$Education <- ordered(dt$Education, 
                        levels = c("Below College", "College", "Bachelor", "Master", "Doctor"))

dt$BusinessTravel <- ordered(dt$BusinessTravel, 
                              levels = c("Non-Travel", "Travel-Rarely", "Travel-Frequently"))

dt$JobLevel <- ordered(dt$JobLevel, 
                       levels = c(1, 2, 3, 4, 5), 
                       labels = c("1", "2", "3", "4", "5"))

dt$StockOptionLevel <- ordered(dt$StockOptionLevel, 
                               levels = c(0, 1, 2, 3), 
                               labels = c("0", "1", "2", "3"))


dt$EnvironmentSatisfaction <- ordered(dt$EnvironmentSatisfaction,
                                      levels = c("N/A","Low", "Medium", "High", "Very High"))

dt$JobInvolvement <- ordered(dt$JobInvolvement, 
                             levels = c("Low", "Medium", "High", "Very High"))

dt$JobSatisfaction <- ordered(dt$JobSatisfaction, 
                              levels = c("N/A","Low", "Medium", "High", "Very High"))

dt$PerformanceRating <- ordered(dt$PerformanceRating, 
                                levels = c("Low", "Good", "Excellent", "Outstanding"))

dt$WorkLifeBalance <- ordered(dt$WorkLifeBalance, 
                              levels = c("N/A","Bad", "Good", "Better", "Best"))

dt$Attrition <- ordered(dt$Attrition, 
                              levels = c("Stayed", "Left"))

# Convert other categorical variables to the correct type.
catcols <- c("Department", "EducationField", "Gender", "JobRole", "MaritalStatus")
dt %<>% mutate_at(catcols, factor)

A quick view to make sure the data was loaded correctly.

dim(dt)
## [1] 4410   26
# There are 4410 instances, or employees documented in the dataset, with 26 variables.
head(dt)
##   Age Attrition    BusinessTravel             Department DistanceFromHome
## 1  51    Stayed     Travel-Rarely                  Sales                6
## 2  31      Left Travel-Frequently Research & Development               10
## 3  32    Stayed Travel-Frequently Research & Development               17
## 4  38    Stayed        Non-Travel Research & Development                2
## 5  32    Stayed     Travel-Rarely Research & Development               10
## 6  46    Stayed     Travel-Rarely Research & Development                8
##       Education EducationField Gender JobLevel                   JobRole
## 1       College  Life Sciences Female        1 Healthcare Representative
## 2 Below College  Life Sciences Female        1        Research Scientist
## 3        Master          Other   Male        4           Sales Executive
## 4        Doctor  Life Sciences   Male        3           Human Resources
## 5 Below College        Medical   Male        1           Sales Executive
## 6      Bachelor  Life Sciences Female        4         Research Director
##   MaritalStatus MonthlyIncome NumCompaniesWorked PercentSalaryHike
## 1       Married        131160                  1                11
## 2        Single         41890                  0                23
## 3       Married        193280                  1                15
## 4       Married         83210                  3                11
## 5        Single         23420                  4                12
## 6       Married         40710                  3                13
##   StockOptionLevel TotalWorkingYears TrainingTimesLastYear YearsAtCompany
## 1                0                 1                     6              1
## 2                1                 6                     3              5
## 3                3                 5                     2              5
## 4                3                13                     5              8
## 5                2                 9                     2              6
## 6                0                28                     5              7
##   YearsSinceLastPromotion YearsWithCurrManager EnvironmentSatisfaction
## 1                       0                    0                    High
## 2                       1                    4                    High
## 3                       0                    3                  Medium
## 4                       7                    5               Very High
## 5                       0                    4               Very High
## 6                       7                    7                    High
##   JobSatisfaction WorkLifeBalance JobInvolvement PerformanceRating AvgHrs
## 1       Very High            Good           High         Excellent   7.37
## 2          Medium            Best         Medium       Outstanding   7.72
## 3          Medium             Bad           High         Excellent   7.01
## 4       Very High          Better         Medium         Excellent   7.19
## 5             Low          Better           High         Excellent   8.01
## 6          Medium            Good           High         Excellent  10.80
str(dt)
## 'data.frame':    4410 obs. of  26 variables:
##  $ Age                    : int  51 31 32 38 32 46 28 29 31 25 ...
##  $ Attrition              : Ord.factor w/ 2 levels "Stayed"<"Left": 1 2 1 1 1 1 2 1 1 1 ...
##  $ BusinessTravel         : Ord.factor w/ 3 levels "Non-Travel"<"Travel-Rarely"<..: 2 3 3 1 2 2 2 2 2 1 ...
##  $ Department             : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
##  $ DistanceFromHome       : int  6 10 17 2 10 8 11 18 1 7 ...
##  $ Education              : Ord.factor w/ 5 levels "Below College"<..: 2 1 4 5 1 3 2 3 3 4 ...
##  $ EducationField         : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
##  $ Gender                 : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 1 2 2 2 1 ...
##  $ JobLevel               : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 4 3 1 4 2 2 3 4 ...
##  $ JobRole                : Factor w/ 9 levels "Healthcare Representative",..: 1 7 8 2 8 6 8 8 3 3 ...
##  $ MaritalStatus          : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 2 3 2 3 2 2 1 ...
##  $ MonthlyIncome          : int  131160 41890 193280 83210 23420 40710 58130 31430 20440 134640 ...
##  $ NumCompaniesWorked     : int  1 0 1 3 4 3 2 2 0 1 ...
##  $ PercentSalaryHike      : int  11 23 15 11 12 13 20 22 21 13 ...
##  $ StockOptionLevel       : Ord.factor w/ 4 levels "0"<"1"<"2"<"3": 1 2 4 4 3 1 2 4 1 2 ...
##  $ TotalWorkingYears      : int  1 6 5 13 9 28 5 10 10 6 ...
##  $ TrainingTimesLastYear  : int  6 3 2 5 2 5 2 2 2 2 ...
##  $ YearsAtCompany         : int  1 5 5 8 6 7 0 0 9 6 ...
##  $ YearsSinceLastPromotion: int  0 1 0 7 0 7 0 0 7 1 ...
##  $ YearsWithCurrManager   : int  0 4 3 5 4 7 0 0 8 5 ...
##  $ EnvironmentSatisfaction: Ord.factor w/ 5 levels "N/A"<"Low"<"Medium"<..: 4 4 3 5 5 4 2 2 3 3 ...
##  $ JobSatisfaction        : Ord.factor w/ 5 levels "N/A"<"Low"<"Medium"<..: 5 3 3 5 2 3 4 3 5 2 ...
##  $ WorkLifeBalance        : Ord.factor w/ 5 levels "N/A"<"Bad"<"Good"<..: 3 5 2 4 4 3 2 4 4 4 ...
##  $ JobInvolvement         : Ord.factor w/ 4 levels "Low"<"Medium"<..: 3 2 3 2 3 3 3 3 3 3 ...
##  $ PerformanceRating      : Ord.factor w/ 4 levels "Low"<"Good"<"Excellent"<..: 3 4 3 3 3 3 4 4 4 3 ...
##  $ AvgHrs                 : num  7.37 7.72 7.01 7.19 8.01 10.8 6.92 6.73 7.24 7.08 ...


Machine Learning ###

1. About metrics

Before diving into developing and comparing Machine Learning models, we must first determine how we will assess how the models are performing. Using multiple metrics to evaluate the models allows us to understand their strengths and weaknesses. The metric of focus is dependent on the purpose of model. The following metrics are commonly used in classification problems.

Confusion matrix

This matrix puts out correctly and incorrectly classified cases in a tabular format. For the binary (two-class) case the confusion matrix is organized as follows:

Scored Positive Scored Negative
Actual Positive True Positive False Negative
Actual Negative False Positive True Negative


In our model, “Left” category in the Attrition feature is defined as positive, and “Stayed” category is negative. Therefore our confusion matrix will be:

Scored Left Scored Stayed
Actual Left True Positive False Negative
Actual Stayed False Positive True Negative


Accuracy is the proportion of all correctly classified cases: \[Accuracy = \frac{TP+TN}{TP+FP+TN+FN}\] This metric can be misleading in imbalaned dataset like our data, and therefore not the best metric for measuring our model performance.

Precision, also called Positive Predictive Value, is the fraction of correctly classified positive cases out of all cases classified as positive:

\[Precision = \frac{TP}{TP+FP}\]

Sensitivity, also called Recall or True Positive Rate, is the proportion of true positive cases that are correctly identified.

\[Sensitivity = \frac{TP}{TP+FN}\]

Specificity, also called Selectivity or True Negative Rate, is the proportion of true negatives that are correctly identified.

\[Specificity = \frac{TN}{(TN+FP)}\]

Receiver Operating Characteristic(ROC) shows the tradeoff between True Positive Rate(Sensitivity) and False Positive Rate, while and Area Under Curve(AUC) is the integral of the ROC curve. The higher the AUC the lower the increase in false positive rate required to achieve a required true positive rate, and a classification model is considered to perform better if the AUC is higher.

2. Which metric is best for our model?

For the company, failing to identify an employee will leave(False Negative) is more costly than incorrectly predicting that an employee will leave(False Positive). Therefore we must choose a metric that penalizes failure to identify positive cases (False Negatives) most heavily. In other words, Sensitivity will be more important than Specificity.

In our case, sensitivity will be the proportion of employees correctly classified at having left against all employees who actually left.

Accuracy is a poor metric for imbalanced data with differential consequences to incorrect classification. Similarly, while AUC is often used for binary classification, it’s also important to remember it can be a misleading if the data is not balanced.

Precision usually has an inverse relationship with Sensitivity and therefore we will focus on maximizing Sensitivity.


3. Global function for measuring performance metrics.

The first function will generate the confusion matrix and calculate metrics values, and the second will reveal which variables were deemed important for the model.

perf_met <- function(df) {
  #Confusion Matrix Summary
  cm <- suppressWarnings(confusionMatrix(data = as.factor(df$score), 
                                         reference = as.factor(df$Attrition), 
                                         positive = "Left"))
  print(cm)
  
  roc_obj <- roc(df$Attrition, df$probs)
  cat(paste('AUC       =', as.character(round(auc(roc_obj),3)),'\n'))

  table <- data.frame(cm$table)
  
  plotTable <- table %>%
    mutate(Correctness = ifelse(table$Prediction == table$Reference, "Correct", "Incorrect")) %>%
    group_by(Reference) %>%
    mutate(Proportion = Freq/sum(Freq))
  
  # Fill alpha relative to sensitivity/specificity by proportional outcomes within reference groups 
  ggplot(data = plotTable, 
         mapping = aes(x=Reference, y=Prediction, fill=Correctness, alpha=Proportion)) + 
      geom_tile() +
      geom_text(aes(label=Freq), vjust=.5, fontface="bold", alpha=1) +
      scale_fill_manual(values = c(Correct="#264d73", Incorrect="#b30000")) +
      xlim(rev(levels(table$Reference))) +
      ylim(levels(table$Prediction)) +
      theme_light()
}

## Function to show  which features are important.
feature_imp= function(mod) {
    imp = varImp(mod)
    
    plot <- ggplot(imp, aes(x=reorder(rownames(imp),Overall), y=Overall)) +
        geom_point(color="skyblue", size=2, alpha=0.8) +
        geom_segment(aes(x=rownames(imp), xend=rownames(imp), y=0, yend=Overall), color='skyblue') +
        xlab('Variable') + 
        ylab('Overall Importance') +
        theme_light() +
        coord_flip() 
  print(anova(mod, test="Chisq"))
  print(plot)
}

4.Prepare the data for modeling

  • Partition the data into train and test sets. We will use this partition for all models for comparison purposes.
set.seed(1955)
## Randomly sample cases to create independent training and test data
partition = createDataPartition(dt[,'Attrition'], times = 1, p = 0.7, list = FALSE)
dt_train = dt[partition,] # Create the training sample
dim(dt_train)
## [1] 3088   26
dt_test = dt[-partition,] # Create the test sample
dim(dt_test)
## [1] 1322   26
  • Scale numeric features to normalize variable importance.
numcols <- dt %>% select_if(is.numeric) %>%  colnames
print(numcols)
##  [1] "Age"                     "DistanceFromHome"       
##  [3] "MonthlyIncome"           "NumCompaniesWorked"     
##  [5] "PercentSalaryHike"       "TotalWorkingYears"      
##  [7] "TrainingTimesLastYear"   "YearsAtCompany"         
##  [9] "YearsSinceLastPromotion" "YearsWithCurrManager"   
## [11] "AvgHrs"
preProcValues <- preProcess(dt_train[,numcols], method = c("center", "scale"))

dt_train[,numcols] = predict(preProcValues, dt_train[,numcols])
dt_test[,numcols] = predict(preProcValues, dt_test[,numcols])
head(dt_train[,numcols])
##          Age DistanceFromHome MonthlyIncome NumCompaniesWorked
## 1  1.5351693      -0.39949765     1.4088205         -0.6721110
## 3 -0.5410709       0.94741412     2.7295641         -0.6721110
## 5 -0.5410709       0.09028845    -0.8818574          0.5352647
## 6  0.9887903      -0.15460460    -0.5142519          0.1328061
## 7 -0.9781741       0.21273497    -0.1438824         -0.2696525
## 9 -0.6503467      -1.01173027    -0.9452157         -1.0745696
##   PercentSalaryHike TotalWorkingYears TrainingTimesLastYear YearsAtCompany
## 1       -1.15935900        -1.3227615             2.5011973    -0.97857482
## 3       -0.06678233        -0.8096420            -0.6088076    -0.33730174
## 5       -0.88621483        -0.2965226            -0.6088076    -0.17698348
## 6       -0.61307067         2.1407948             1.7236961    -0.01666521
## 7        1.29893850        -0.8096420            -0.6088076    -1.13889309
## 9        1.57208267        -0.1682427            -0.6088076     0.30397133
##   YearsSinceLastPromotion YearsWithCurrManager     AvgHrs
## 1              -0.6824716          -1.16099726 -0.2538199
## 3              -0.6824716          -0.32872313 -0.5216240
## 5              -0.6824716          -0.05129842  0.2222764
## 6               1.4623824           0.78097571  2.2977583
## 7              -0.6824716          -1.16099726 -0.5885750
## 9               1.4623824           1.05840042 -0.3505269
head(dt_train)
##          Age Attrition    BusinessTravel             Department
## 1  1.5351693    Stayed     Travel-Rarely                  Sales
## 3 -0.5410709    Stayed Travel-Frequently Research & Development
## 5 -0.5410709    Stayed     Travel-Rarely Research & Development
## 6  0.9887903    Stayed     Travel-Rarely Research & Development
## 7 -0.9781741      Left     Travel-Rarely Research & Development
## 9 -0.6503467    Stayed     Travel-Rarely Research & Development
##   DistanceFromHome     Education EducationField Gender JobLevel
## 1      -0.39949765       College  Life Sciences Female        1
## 3       0.94741412        Master          Other   Male        4
## 5       0.09028845 Below College        Medical   Male        1
## 6      -0.15460460      Bachelor  Life Sciences Female        4
## 7       0.21273497       College        Medical   Male        2
## 9      -1.01173027      Bachelor  Life Sciences   Male        3
##                     JobRole MaritalStatus MonthlyIncome NumCompaniesWorked
## 1 Healthcare Representative       Married     1.4088205         -0.6721110
## 3           Sales Executive       Married     2.7295641         -0.6721110
## 5           Sales Executive        Single    -0.8818574          0.5352647
## 6         Research Director       Married    -0.5142519          0.1328061
## 7           Sales Executive        Single    -0.1438824         -0.2696525
## 9     Laboratory Technician       Married    -0.9452157         -1.0745696
##   PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear
## 1       -1.15935900                0        -1.3227615             2.5011973
## 3       -0.06678233                3        -0.8096420            -0.6088076
## 5       -0.88621483                2        -0.2965226            -0.6088076
## 6       -0.61307067                0         2.1407948             1.7236961
## 7        1.29893850                1        -0.8096420            -0.6088076
## 9        1.57208267                0        -0.1682427            -0.6088076
##   YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
## 1    -0.97857482              -0.6824716          -1.16099726
## 3    -0.33730174              -0.6824716          -0.32872313
## 5    -0.17698348              -0.6824716          -0.05129842
## 6    -0.01666521               1.4623824           0.78097571
## 7    -1.13889309              -0.6824716          -1.16099726
## 9     0.30397133               1.4623824           1.05840042
##   EnvironmentSatisfaction JobSatisfaction WorkLifeBalance JobInvolvement
## 1                    High       Very High            Good           High
## 3                  Medium          Medium             Bad           High
## 5               Very High             Low          Better           High
## 6                    High          Medium            Good           High
## 7                     Low            High             Bad           High
## 9                  Medium       Very High          Better           High
##   PerformanceRating     AvgHrs
## 1         Excellent -0.2538199
## 3         Excellent -0.5216240
## 5         Excellent  0.2222764
## 6         Excellent  2.2977583
## 7       Outstanding -0.5885750
## 9       Outstanding -0.3505269

Let’s start with Logistic Regression Model as requested by the client.

A. Logistic Regression

Create a copy of the partitioned datasets. (This step is not necessary but I prefer to keep the original dataset untouched.)

dLM_train <- dt_train
dLM_test <- dt_test
head(dLM_train)
##          Age Attrition    BusinessTravel             Department
## 1  1.5351693    Stayed     Travel-Rarely                  Sales
## 3 -0.5410709    Stayed Travel-Frequently Research & Development
## 5 -0.5410709    Stayed     Travel-Rarely Research & Development
## 6  0.9887903    Stayed     Travel-Rarely Research & Development
## 7 -0.9781741      Left     Travel-Rarely Research & Development
## 9 -0.6503467    Stayed     Travel-Rarely Research & Development
##   DistanceFromHome     Education EducationField Gender JobLevel
## 1      -0.39949765       College  Life Sciences Female        1
## 3       0.94741412        Master          Other   Male        4
## 5       0.09028845 Below College        Medical   Male        1
## 6      -0.15460460      Bachelor  Life Sciences Female        4
## 7       0.21273497       College        Medical   Male        2
## 9      -1.01173027      Bachelor  Life Sciences   Male        3
##                     JobRole MaritalStatus MonthlyIncome NumCompaniesWorked
## 1 Healthcare Representative       Married     1.4088205         -0.6721110
## 3           Sales Executive       Married     2.7295641         -0.6721110
## 5           Sales Executive        Single    -0.8818574          0.5352647
## 6         Research Director       Married    -0.5142519          0.1328061
## 7           Sales Executive        Single    -0.1438824         -0.2696525
## 9     Laboratory Technician       Married    -0.9452157         -1.0745696
##   PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear
## 1       -1.15935900                0        -1.3227615             2.5011973
## 3       -0.06678233                3        -0.8096420            -0.6088076
## 5       -0.88621483                2        -0.2965226            -0.6088076
## 6       -0.61307067                0         2.1407948             1.7236961
## 7        1.29893850                1        -0.8096420            -0.6088076
## 9        1.57208267                0        -0.1682427            -0.6088076
##   YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
## 1    -0.97857482              -0.6824716          -1.16099726
## 3    -0.33730174              -0.6824716          -0.32872313
## 5    -0.17698348              -0.6824716          -0.05129842
## 6    -0.01666521               1.4623824           0.78097571
## 7    -1.13889309              -0.6824716          -1.16099726
## 9     0.30397133               1.4623824           1.05840042
##   EnvironmentSatisfaction JobSatisfaction WorkLifeBalance JobInvolvement
## 1                    High       Very High            Good           High
## 3                  Medium          Medium             Bad           High
## 5               Very High             Low          Better           High
## 6                    High          Medium            Good           High
## 7                     Low            High             Bad           High
## 9                  Medium       Very High          Better           High
##   PerformanceRating     AvgHrs
## 1         Excellent -0.2538199
## 3         Excellent -0.5216240
## 5         Excellent  0.2222764
## 6         Excellent  2.2977583
## 7       Outstanding -0.5885750
## 9       Outstanding -0.3505269
head(dLM_test)
##           Age Attrition    BusinessTravel             Department
## 2  -0.6503467      Left Travel-Frequently Research & Development
## 4   0.1145839    Stayed        Non-Travel Research & Development
## 8  -0.8688983    Stayed     Travel-Rarely Research & Development
## 14  1.0980661      Left        Non-Travel Research & Development
## 17 -1.7431047    Stayed     Travel-Rarely Research & Development
## 21 -1.1967257    Stayed Travel-Frequently Research & Development
##    DistanceFromHome     Education EducationField Gender JobLevel
## 2        0.09028845 Below College  Life Sciences Female        1
## 4       -0.88928374        Doctor  Life Sciences   Male        3
## 8        1.06986064      Bachelor  Life Sciences   Male        2
## 14      -1.01173027 Below College        Medical   Male        1
## 17      -0.76683722       College  Life Sciences   Male        1
## 21      -1.01173027        Master          Other   Male        2
##                  JobRole MaritalStatus MonthlyIncome NumCompaniesWorked
## 2     Research Scientist        Single    -0.4891637         -1.0745696
## 4        Human Resources       Married     0.3893476          0.1328061
## 8        Sales Executive       Married    -0.7115555         -0.2696525
## 14    Research Scientist       Married    -0.1547256         -0.6721110
## 17 Laboratory Technician        Single    -0.4840610         -0.6721110
## 21 Laboratory Technician      Divorced     0.8413600         -0.6721110
##    PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear
## 2          2.1183710                1        -0.6813621             0.1686936
## 4         -1.1593590                3         0.2165969             1.7236961
## 8          1.8452268                3        -0.1682427            -0.6088076
## 14        -1.1593590                2        -0.1682427             0.9461948
## 17        -0.8862148                3        -1.0662017             0.1686936
## 21         0.7526502                0        -0.6813621             0.1686936
##    YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
## 2      -0.3373017              -0.3760639          -0.05129842
## 4       0.1436531               1.4623824           0.22612629
## 8      -1.1388931              -0.6824716          -1.16099726
## 14      0.4642896               2.0751978           1.33582514
## 17     -0.6579383              -0.3760639          -1.16099726
## 21     -0.1769835              -0.3760639          -0.05129842
##    EnvironmentSatisfaction JobSatisfaction WorkLifeBalance JobInvolvement
## 2                     High          Medium            Best         Medium
## 4                Very High       Very High          Better         Medium
## 8                      Low          Medium          Better           High
## 14                     Low          Medium            Good         Medium
## 17               Very High            High            Best         Medium
## 21                    High          Medium             Bad           High
##    PerformanceRating       AvgHrs
## 2        Outstanding  0.006545263
## 4          Excellent -0.387721916
## 8        Outstanding -0.729916072
## 14         Excellent  1.256297831
## 17         Excellent -0.811745109
## 21         Excellent -0.090161781


1. Perform a baseline Logistric Regression modeling.

set.seed(1955)
glm_mod = glm(Attrition ~ ., 
                   family = binomial, data = dLM_train,
                  )

Let’s look at the summary of the model.

anova(glm_mod, test="Chisq")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087     2728.4              
## Age                      1   70.067      3086     2658.3 < 2.2e-16 ***
## BusinessTravel           2   50.880      3084     2607.4 8.946e-12 ***
## Department               2   33.530      3082     2573.9 5.237e-08 ***
## DistanceFromHome         1    0.001      3081     2573.9   0.98070    
## Education                4    9.497      3077     2564.4   0.04981 *  
## EducationField           5    7.849      3072     2556.5   0.16478    
## Gender                   1    1.281      3071     2555.3   0.25770    
## JobLevel                 4    5.836      3067     2549.4   0.21171    
## JobRole                  8   13.019      3059     2536.4   0.11120    
## MaritalStatus            2   61.618      3057     2474.8 4.166e-14 ***
## MonthlyIncome            1    2.335      3056     2472.4   0.12651    
## NumCompaniesWorked       1   45.589      3055     2426.9 1.459e-11 ***
## PercentSalaryHike        1    2.458      3054     2424.4   0.11691    
## StockOptionLevel         3    2.660      3051     2421.7   0.44700    
## TotalWorkingYears        1   41.635      3050     2380.1 1.100e-10 ***
## TrainingTimesLastYear    1    6.517      3049     2373.6   0.01069 *  
## YearsAtCompany           1    0.048      3048     2373.6   0.82728    
## YearsSinceLastPromotion  1   23.943      3047     2349.6 9.924e-07 ***
## YearsWithCurrManager     1   31.283      3046     2318.3 2.230e-08 ***
## EnvironmentSatisfaction  4   58.048      3042     2260.3 7.454e-12 ***
## JobSatisfaction          4   46.556      3038     2213.7 1.887e-09 ***
## WorkLifeBalance          4   26.197      3034     2187.5 2.888e-05 ***
## JobInvolvement           3    9.117      3031     2178.4   0.02777 *  
## PerformanceRating        1    0.408      3030     2178.0   0.52308    
## AvgHrs                   1  130.107      3029     2047.9 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the trained model, make predictions for the test data.

dLM_test %<>% mutate(probs= predict(glm_mod, newdata=dLM_test, type = 'response'))

score_model = function(df, threshold){
    df %<>% mutate(score = ifelse(probs < threshold, "Stayed", "Left"))
}

dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition      probs  score
## 1       Left 0.24513583 Stayed
## 2     Stayed 0.00343613 Stayed
## 3     Stayed 0.19882036 Stayed
## 4       Left 0.15731848 Stayed
## 5     Stayed 0.29864046 Stayed
## 6     Stayed 0.21648992 Stayed
## 7     Stayed 0.11884501 Stayed
## 8     Stayed 0.06457121 Stayed
## 9     Stayed 0.83219361   Left
## 10    Stayed 0.03177042 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed   1084  156
##     Left       25   57
##                                           
##                Accuracy : 0.8631          
##                  95% CI : (0.8434, 0.8812)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 0.00824         
##                                           
##                   Kappa : 0.3261          
##                                           
##  Mcnemar's Test P-Value : < 2e-16         
##                                           
##             Sensitivity : 0.26761         
##             Specificity : 0.97746         
##          Pos Pred Value : 0.69512         
##          Neg Pred Value : 0.87419         
##              Prevalence : 0.16112         
##          Detection Rate : 0.04312         
##    Detection Prevalence : 0.06203         
##       Balanced Accuracy : 0.62253         
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.825

feature_imp(glm_mod)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087     2728.4              
## Age                      1   70.067      3086     2658.3 < 2.2e-16 ***
## BusinessTravel           2   50.880      3084     2607.4 8.946e-12 ***
## Department               2   33.530      3082     2573.9 5.237e-08 ***
## DistanceFromHome         1    0.001      3081     2573.9   0.98070    
## Education                4    9.497      3077     2564.4   0.04981 *  
## EducationField           5    7.849      3072     2556.5   0.16478    
## Gender                   1    1.281      3071     2555.3   0.25770    
## JobLevel                 4    5.836      3067     2549.4   0.21171    
## JobRole                  8   13.019      3059     2536.4   0.11120    
## MaritalStatus            2   61.618      3057     2474.8 4.166e-14 ***
## MonthlyIncome            1    2.335      3056     2472.4   0.12651    
## NumCompaniesWorked       1   45.589      3055     2426.9 1.459e-11 ***
## PercentSalaryHike        1    2.458      3054     2424.4   0.11691    
## StockOptionLevel         3    2.660      3051     2421.7   0.44700    
## TotalWorkingYears        1   41.635      3050     2380.1 1.100e-10 ***
## TrainingTimesLastYear    1    6.517      3049     2373.6   0.01069 *  
## YearsAtCompany           1    0.048      3048     2373.6   0.82728    
## YearsSinceLastPromotion  1   23.943      3047     2349.6 9.924e-07 ***
## YearsWithCurrManager     1   31.283      3046     2318.3 2.230e-08 ***
## EnvironmentSatisfaction  4   58.048      3042     2260.3 7.454e-12 ***
## JobSatisfaction          4   46.556      3038     2213.7 1.887e-09 ***
## WorkLifeBalance          4   26.197      3034     2187.5 2.888e-05 ***
## JobInvolvement           3    9.117      3031     2178.4   0.02777 *  
## PerformanceRating        1    0.408      3030     2178.0   0.52308    
## AvgHrs                   1  130.107      3029     2047.9 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Insights

  • When looking at the accuracy, specificity, and AUC values, his model seems to perform well. However as noted previously, this is extremely misleading because our label features have an imbalance of 84 to 16 against the positive label(Left). Therefore an accuracy of 0.8631 is not much better than no modeling. First thing we need to do is correct the imbalance.
  • This is also evident when we look at the our metric of interest, sensitivity, which yielded only 0.2676.
  • Our goal is to increase the correctness in the top left quadrant of the confusion matrix(Predicted Left x Actual Left).
  • There are several variables that were no statistically significant(high p-value) or important in the initial model, such as DistanceFromHome or PerformanceRating. At first glance, this is in agreement with the exploratory data analysis. As we improve the model subsequently, we will continue to examine whether our observations are represented by the predictive model.

2. Perform Logistic Regression model tuning

Now that we have an idea what a baseline logistic model looks like, we can try to improve the model by feature selection and hyperparameter tuning.

Before we change the model itself, we must deal with sample imbalance by increasing the weight of “Left” cases.

## Create a weight vector for the training cases.
sum(length(dLM_train$Attrition[dLM_train$Attrition=="Left"]))/nrow(dLM_train)*100
## [1] 16.12694
## 16% of training data Left and 84% Stayed. 

weights = ifelse(dLM_train$Attrition == 'Left', 0.84, 0.16)

How the model performance changes after correcting the imbalance?

## GLM with weights
glm_mod_w = glm(Attrition ~ ., 
                     family = quasibinomial, data = dLM_train,
                     weights = weights)
dLM_test %<>% mutate(probs= predict(glm_mod_w, newdata=dLM_test, type = 'response'))

dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:20, c('Attrition','probs','score')]
##    Attrition       probs  score
## 1       Left 0.665464915   Left
## 2     Stayed 0.009372544 Stayed
## 3     Stayed 0.580036990   Left
## 4       Left 0.389817745 Stayed
## 5     Stayed 0.626400169   Left
## 6     Stayed 0.503305464   Left
## 7     Stayed 0.357322937 Stayed
## 8     Stayed 0.249437938 Stayed
## 9     Stayed 0.967494887   Left
## 10    Stayed 0.167477506 Stayed
## 11    Stayed 0.317412158 Stayed
## 12    Stayed 0.309383026 Stayed
## 13      Left 0.699540461   Left
## 14    Stayed 0.255447103 Stayed
## 15    Stayed 0.564857332   Left
## 16    Stayed 0.094538473 Stayed
## 17    Stayed 0.456090380 Stayed
## 18    Stayed 0.476937353 Stayed
## 19    Stayed 0.201687751 Stayed
## 20    Stayed 0.497591008 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    822   52
##     Left      287  161
##                                           
##                Accuracy : 0.7436          
##                  95% CI : (0.7191, 0.7669)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3438          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7559          
##             Specificity : 0.7412          
##          Pos Pred Value : 0.3594          
##          Neg Pred Value : 0.9405          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1218          
##    Detection Prevalence : 0.3389          
##       Balanced Accuracy : 0.7485          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.819

feature_imp(glm_mod)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087     2728.4              
## Age                      1   70.067      3086     2658.3 < 2.2e-16 ***
## BusinessTravel           2   50.880      3084     2607.4 8.946e-12 ***
## Department               2   33.530      3082     2573.9 5.237e-08 ***
## DistanceFromHome         1    0.001      3081     2573.9   0.98070    
## Education                4    9.497      3077     2564.4   0.04981 *  
## EducationField           5    7.849      3072     2556.5   0.16478    
## Gender                   1    1.281      3071     2555.3   0.25770    
## JobLevel                 4    5.836      3067     2549.4   0.21171    
## JobRole                  8   13.019      3059     2536.4   0.11120    
## MaritalStatus            2   61.618      3057     2474.8 4.166e-14 ***
## MonthlyIncome            1    2.335      3056     2472.4   0.12651    
## NumCompaniesWorked       1   45.589      3055     2426.9 1.459e-11 ***
## PercentSalaryHike        1    2.458      3054     2424.4   0.11691    
## StockOptionLevel         3    2.660      3051     2421.7   0.44700    
## TotalWorkingYears        1   41.635      3050     2380.1 1.100e-10 ***
## TrainingTimesLastYear    1    6.517      3049     2373.6   0.01069 *  
## YearsAtCompany           1    0.048      3048     2373.6   0.82728    
## YearsSinceLastPromotion  1   23.943      3047     2349.6 9.924e-07 ***
## YearsWithCurrManager     1   31.283      3046     2318.3 2.230e-08 ***
## EnvironmentSatisfaction  4   58.048      3042     2260.3 7.454e-12 ***
## JobSatisfaction          4   46.556      3038     2213.7 1.887e-09 ***
## WorkLifeBalance          4   26.197      3034     2187.5 2.888e-05 ***
## JobInvolvement           3    9.117      3031     2178.4   0.02777 *  
## PerformanceRating        1    0.408      3030     2178.0   0.52308    
## AvgHrs                   1  130.107      3029     2047.9 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Insight

  • Although accuracy, specificity, and AUC values slightly decreased, there was a substantial increase in sensitivity. THerefore we can conclude that this weight correction is a step in the right direction.
  • Furthermore, the confusion matrix has improved significant.

We can now continue with the model tuning.

Feature Selection

One concern with a model that performs successful is that there is over-fitting of the model. Reducing features can assist in reducing multi-colinearity and increasing generalization of the model.

Model 1. All features

glm_mod_1 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  DistanceFromHome + 
                  Education +
                  EducationField + 
                  Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)
dLM_test$probs= predict(glm_mod_1, newdata=dLM_test, type = 'response')

dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition       probs  score
## 1       Left 0.665464915   Left
## 2     Stayed 0.009372544 Stayed
## 3     Stayed 0.580036990   Left
## 4       Left 0.389817745 Stayed
## 5     Stayed 0.626400169   Left
## 6     Stayed 0.503305464   Left
## 7     Stayed 0.357322937 Stayed
## 8     Stayed 0.249437938 Stayed
## 9     Stayed 0.967494887   Left
## 10    Stayed 0.167477506 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    822   52
##     Left      287  161
##                                           
##                Accuracy : 0.7436          
##                  95% CI : (0.7191, 0.7669)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3438          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7559          
##             Specificity : 0.7412          
##          Pos Pred Value : 0.3594          
##          Neg Pred Value : 0.9405          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1218          
##    Detection Prevalence : 0.3389          
##       Balanced Accuracy : 0.7485          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.819

feature_imp(glm_mod_1)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 < 2.2e-16 ***
## Department               2   15.516      3082    1083.88 3.194e-12 ***
## DistanceFromHome         1    0.019      3081    1083.86 0.8012431    
## Education                4    4.812      3077    1079.05 0.0025061 ** 
## EducationField           5    2.892      3072    1076.15 0.0790612 .  
## Gender                   1    0.821      3071    1075.33 0.0942239 .  
## JobLevel                 4    2.990      3067    1072.34 0.0371593 *  
## JobRole                  8    6.925      3059    1065.42 0.0026463 ** 
## MaritalStatus            2   29.558      3057    1035.86 < 2.2e-16 ***
## MonthlyIncome            1    1.156      3056    1034.70 0.0469944 *  
## NumCompaniesWorked       1   23.606      3055    1011.10 < 2.2e-16 ***
## PercentSalaryHike        1    0.982      3054    1010.12 0.0672017 .  
## StockOptionLevel         3    1.360      3051    1008.76 0.2000052    
## TotalWorkingYears        1   15.570      3050     993.19 3.130e-13 ***
## TrainingTimesLastYear    1    6.100      3049     987.09 5.066e-06 ***
## YearsAtCompany           1    0.319      3048     986.77 0.2968302    
## YearsSinceLastPromotion  1    8.872      3047     977.90 3.758e-08 ***
## YearsWithCurrManager     1   24.768      3046     953.13 < 2.2e-16 ***
## EnvironmentSatisfaction  4   26.940      3042     926.19 < 2.2e-16 ***
## JobSatisfaction          4   21.179      3038     905.01 7.551e-15 ***
## WorkLifeBalance          4   11.976      3034     893.03 2.872e-08 ***
## JobInvolvement           3    6.166      3031     886.87 0.0001033 ***
## PerformanceRating        1    0.389      3030     886.48 0.2493721    
## AvgHrs                   1   52.665      3029     833.81 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 1 Performance

Accuracy 0.7436
Sensitivity 0.7559
Specificity 0.7412
AUC 0.819


For next step, I will remove DistanceFromHome among the model features because the p-value indicates it is not significant and it was not deemed an important variable.

Model 2. Remove DistanceFromHome

glm_mod_2 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  EducationField + 
                  Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)
dLM_test$probs= predict(glm_mod_2, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition       probs  score
## 1       Left 0.664231177   Left
## 2     Stayed 0.009255964 Stayed
## 3     Stayed 0.585345845   Left
## 4       Left 0.385019667 Stayed
## 5     Stayed 0.622988520   Left
## 6     Stayed 0.496550765 Stayed
## 7     Stayed 0.352931631 Stayed
## 8     Stayed 0.253555889 Stayed
## 9     Stayed 0.967401478   Left
## 10    Stayed 0.166238289 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    825   52
##     Left      284  161
##                                           
##                Accuracy : 0.7458          
##                  95% CI : (0.7215, 0.7691)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3471          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7559          
##             Specificity : 0.7439          
##          Pos Pred Value : 0.3618          
##          Neg Pred Value : 0.9407          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1218          
##    Detection Prevalence : 0.3366          
##       Balanced Accuracy : 0.7499          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.819

feature_imp(glm_mod_2)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 < 2.2e-16 ***
## Department               2   15.516      3082    1083.88 2.891e-12 ***
## Education                4    4.788      3078    1079.09 0.0025290 ** 
## EducationField           5    2.931      3073    1076.16 0.0741386 .  
## Gender                   1    0.800      3072    1075.36 0.0978385 .  
## JobLevel                 4    2.994      3068    1072.36 0.0363614 *  
## JobRole                  8    6.940      3060    1065.42 0.0025068 ** 
## MaritalStatus            2   29.506      3058    1035.92 < 2.2e-16 ***
## MonthlyIncome            1    1.168      3057    1034.75 0.0454666 *  
## NumCompaniesWorked       1   23.410      3056    1011.34 < 2.2e-16 ***
## PercentSalaryHike        1    1.008      3055    1010.33 0.0631103 .  
## StockOptionLevel         3    1.360      3052    1008.97 0.1986681    
## TotalWorkingYears        1   15.653      3051     993.32 2.449e-13 ***
## TrainingTimesLastYear    1    6.100      3050     987.22 4.864e-06 ***
## YearsAtCompany           1    0.346      3049     986.87 0.2764910    
## YearsSinceLastPromotion  1    8.859      3048     978.01 3.626e-08 ***
## YearsWithCurrManager     1   24.716      3047     953.30 < 2.2e-16 ***
## EnvironmentSatisfaction  4   27.028      3043     926.27 < 2.2e-16 ***
## JobSatisfaction          4   21.249      3039     905.02 5.892e-15 ***
## WorkLifeBalance          4   11.984      3035     893.04 2.633e-08 ***
## JobInvolvement           3    6.162      3032     886.88 0.0001002 ***
## PerformanceRating        1    0.389      3031     886.49 0.2482443    
## AvgHrs                   1   52.626      3030     833.86 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 2 Performance

Accuracy 0.7458
Sensitivity 0.7559
Specificity 0.7439
AUC 0.819


  • Metric values did not change much, but this also indicates that this feature did not contribute to the model significantly. Therefore we should keep it out of the model.


Model 3. Remove StockOptionLevel

glm_mod_3 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  EducationField + 
                  Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)
dLM_test$probs= predict(glm_mod_3, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition       probs  score
## 1       Left 0.671471866   Left
## 2     Stayed 0.009654317 Stayed
## 3     Stayed 0.595584262   Left
## 4       Left 0.359600849 Stayed
## 5     Stayed 0.625388900   Left
## 6     Stayed 0.491898556 Stayed
## 7     Stayed 0.347320359 Stayed
## 8     Stayed 0.261389962 Stayed
## 9     Stayed 0.967605237   Left
## 10    Stayed 0.164727824 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    829   50
##     Left      280  163
##                                           
##                Accuracy : 0.7504          
##                  95% CI : (0.7261, 0.7735)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.357           
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7653          
##             Specificity : 0.7475          
##          Pos Pred Value : 0.3679          
##          Neg Pred Value : 0.9431          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1233          
##    Detection Prevalence : 0.3351          
##       Balanced Accuracy : 0.7564          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.819

feature_imp(glm_mod_3)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 < 2.2e-16 ***
## Department               2   15.516      3082    1083.88 2.966e-12 ***
## Education                4    4.788      3078    1079.09  0.002547 ** 
## EducationField           5    2.931      3073    1076.16  0.074410 .  
## Gender                   1    0.800      3072    1075.36  0.098001 .  
## JobLevel                 4    2.994      3068    1072.36  0.036512 *  
## JobRole                  8    6.940      3060    1065.42  0.002529 ** 
## MaritalStatus            2   29.506      3058    1035.92 < 2.2e-16 ***
## MonthlyIncome            1    1.168      3057    1034.75  0.045571 *  
## NumCompaniesWorked       1   23.410      3056    1011.34 < 2.2e-16 ***
## PercentSalaryHike        1    1.008      3055    1010.33  0.063238 .  
## TotalWorkingYears        1   15.263      3054     995.07 4.958e-13 ***
## TrainingTimesLastYear    1    6.337      3053     988.73 3.216e-06 ***
## YearsAtCompany           1    0.341      3052     988.39  0.280200    
## YearsSinceLastPromotion  1    8.799      3051     979.59 4.092e-08 ***
## YearsWithCurrManager     1   24.446      3050     955.15 < 2.2e-16 ***
## EnvironmentSatisfaction  4   27.763      3046     927.38 < 2.2e-16 ***
## JobSatisfaction          4   21.309      3042     906.07 5.515e-15 ***
## WorkLifeBalance          4   12.015      3038     894.06 2.551e-08 ***
## JobInvolvement           3    6.515      3035     887.54 5.671e-05 ***
## PerformanceRating        1    0.407      3034     887.14  0.238261    
## AvgHrs                   1   52.917      3033     834.22 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 3 Performance

Accuracy 0.7504
Sensitivity 0.7653
Specificity 0.7475
AUC 0.819


* All metric calculations showed improvements so we will keep this feature out. * I will continue this process feature by feature to arrive at the simplest model.

Model 4. Remove YearsAtCompany

glm_mod_4 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  EducationField + 
                  Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  # YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)
dLM_test$probs= predict(glm_mod_4, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition      probs  score
## 1       Left 0.67704493   Left
## 2     Stayed 0.01264084 Stayed
## 3     Stayed 0.66417158   Left
## 4       Left 0.41947427 Stayed
## 5     Stayed 0.60319744   Left
## 6     Stayed 0.48401040 Stayed
## 7     Stayed 0.23249550 Stayed
## 8     Stayed 0.25094262 Stayed
## 9     Stayed 0.96591209   Left
## 10    Stayed 0.12816160 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    813   52
##     Left      296  161
##                                           
##                Accuracy : 0.7368          
##                  95% CI : (0.7121, 0.7603)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3343          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7559          
##             Specificity : 0.7331          
##          Pos Pred Value : 0.3523          
##          Neg Pred Value : 0.9399          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1218          
##    Detection Prevalence : 0.3457          
##       Balanced Accuracy : 0.7445          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.82

feature_imp(glm_mod_4)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 7.865e-16 ***
## Department               2   15.516      3082    1083.88 1.788e-10 ***
## Education                4    4.788      3078    1079.09 0.0077808 ** 
## EducationField           5    2.931      3073    1076.16 0.1316630    
## Gender                   1    0.800      3072    1075.36 0.1281293    
## JobLevel                 4    2.994      3068    1072.36 0.0701219 .  
## JobRole                  8    6.940      3060    1065.42 0.0100451 *  
## MaritalStatus            2   29.506      3058    1035.92 < 2.2e-16 ***
## MonthlyIncome            1    1.168      3057    1034.75 0.0659872 .  
## NumCompaniesWorked       1   23.410      3056    1011.34 < 2.2e-16 ***
## PercentSalaryHike        1    1.008      3055    1010.33 0.0876195 .  
## TotalWorkingYears        1   15.263      3054     995.07 3.030e-11 ***
## TrainingTimesLastYear    1    6.337      3053     988.73 1.853e-05 ***
## YearsSinceLastPromotion  1    8.107      3052     980.62 1.279e-06 ***
## YearsWithCurrManager     1   22.190      3051     958.43 1.125e-15 ***
## EnvironmentSatisfaction  4   28.146      3047     930.29 < 2.2e-16 ***
## JobSatisfaction          4   19.603      3043     910.69 1.420e-11 ***
## WorkLifeBalance          4   11.647      3039     899.04 8.600e-07 ***
## JobInvolvement           3    6.900      3036     892.14 0.0001728 ***
## PerformanceRating        1    0.440      3035     891.70 0.2590968    
## AvgHrs                   1   51.502      3034     840.20 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 4 Performance

Accuracy 0.7368
Sensitivity 0.7559
Specificity 0.7331
AUC 0.82


* All metrics deteriorated, so we will try adding the feature back and remove a different feature.

Model 5. Add YearsAtCompany back and remove PerformanceRating

glm_mod_5 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  EducationField + 
                  Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  # PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)

dLM_test$probs= predict(glm_mod_5, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition       probs  score
## 1       Left 0.673259822   Left
## 2     Stayed 0.009846031 Stayed
## 3     Stayed 0.600008249   Left
## 4       Left 0.361518746 Stayed
## 5     Stayed 0.626724031   Left
## 6     Stayed 0.488329322 Stayed
## 7     Stayed 0.346870581 Stayed
## 8     Stayed 0.262715362 Stayed
## 9     Stayed 0.967970060   Left
## 10    Stayed 0.165483647 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    829   49
##     Left      280  164
##                                           
##                Accuracy : 0.7511          
##                  95% CI : (0.7269, 0.7742)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3598          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7700          
##             Specificity : 0.7475          
##          Pos Pred Value : 0.3694          
##          Neg Pred Value : 0.9442          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1241          
##    Detection Prevalence : 0.3359          
##       Balanced Accuracy : 0.7587          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.819

feature_imp(glm_mod_5)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 < 2.2e-16 ***
## Department               2   15.516      3082    1083.88 2.949e-12 ***
## Education                4    4.788      3078    1079.09  0.002543 ** 
## EducationField           5    2.931      3073    1076.16  0.074348 .  
## Gender                   1    0.800      3072    1075.36  0.097964 .  
## JobLevel                 4    2.994      3068    1072.36  0.036478 *  
## JobRole                  8    6.940      3060    1065.42  0.002524 ** 
## MaritalStatus            2   29.506      3058    1035.92 < 2.2e-16 ***
## MonthlyIncome            1    1.168      3057    1034.75  0.045547 *  
## NumCompaniesWorked       1   23.410      3056    1011.34 < 2.2e-16 ***
## PercentSalaryHike        1    1.008      3055    1010.33  0.063209 .  
## TotalWorkingYears        1   15.263      3054     995.07 4.929e-13 ***
## TrainingTimesLastYear    1    6.337      3053     988.73 3.208e-06 ***
## YearsAtCompany           1    0.341      3052     988.39  0.280147    
## YearsSinceLastPromotion  1    8.799      3051     979.59 4.078e-08 ***
## YearsWithCurrManager     1   24.446      3050     955.15 < 2.2e-16 ***
## EnvironmentSatisfaction  4   27.763      3046     927.38 < 2.2e-16 ***
## JobSatisfaction          4   21.309      3042     906.07 5.472e-15 ***
## WorkLifeBalance          4   12.015      3038     894.06 2.540e-08 ***
## JobInvolvement           3    6.515      3035     887.54 5.657e-05 ***
## AvgHrs                   1   53.308      3034     834.23 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 5 Performance

Accuracy 0.7511
Sensitivity 0.7700
Specificity 0.7475
AUC 0.819
  • All metric improved notably except for AUC, which did not change. However the p-value for YearsAtCompany continued to be large, therefore we will remove it again.
Model 6. Remove YearsAtCompany again and remove EducationField
glm_mod_6 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  # EducationField + 
                  Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  # YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  # PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)

dLM_test$probs= predict(glm_mod_6, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition     probs  score
## 1       Left 0.6854979   Left
## 2     Stayed 0.0121785 Stayed
## 3     Stayed 0.6581509   Left
## 4       Left 0.3946230 Stayed
## 5     Stayed 0.6158727   Left
## 6     Stayed 0.5910461   Left
## 7     Stayed 0.2137386 Stayed
## 8     Stayed 0.2353871 Stayed
## 9     Stayed 0.9664472   Left
## 10    Stayed 0.1458530 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    804   50
##     Left      305  163
##                                           
##                Accuracy : 0.7315          
##                  95% CI : (0.7067, 0.7552)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3304          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7653          
##             Specificity : 0.7250          
##          Pos Pred Value : 0.3483          
##          Neg Pred Value : 0.9415          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1233          
##    Detection Prevalence : 0.3540          
##       Balanced Accuracy : 0.7451          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.82

feature_imp(glm_mod_6)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 5.549e-16 ***
## Department               2   15.516      3082    1083.88 1.427e-10 ***
## Education                4    4.788      3078    1079.09 0.0073223 ** 
## Gender                   1    0.737      3077    1078.35 0.1422847    
## JobLevel                 4    3.442      3073    1074.91 0.0394413 *  
## JobRole                  8    6.802      3065    1068.11 0.0108138 *  
## MaritalStatus            2   29.317      3063    1038.79 < 2.2e-16 ***
## MonthlyIncome            1    1.172      3062    1037.62 0.0642607 .  
## NumCompaniesWorked       1   22.497      3061    1015.12 5.143e-16 ***
## PercentSalaryHike        1    1.030      3060    1014.09 0.0827544 .  
## TotalWorkingYears        1   15.169      3059     998.92 2.779e-11 ***
## TrainingTimesLastYear    1    6.145      3058     992.78 2.261e-05 ***
## YearsSinceLastPromotion  1    7.323      3057     985.46 3.729e-06 ***
## YearsWithCurrManager     1   22.400      3056     963.06 5.941e-16 ***
## EnvironmentSatisfaction  4   27.719      3052     935.34 < 2.2e-16 ***
## JobSatisfaction          4   20.177      3048     915.16 4.797e-12 ***
## WorkLifeBalance          4   12.165      3044     903.00 3.585e-07 ***
## JobInvolvement           3    6.712      3041     896.28 0.0002041 ***
## AvgHrs                   1   52.532      3040     843.75 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 6 Performance

Accuracy 0.7315
Sensitivity 0.7653
Specificity 0.7250
AUC 0.82


Model 7. Remove Gender
glm_mod_7 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  # EducationField + 
                  # Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  # YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  # PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)

dLM_test$probs= predict(glm_mod_7, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition      probs  score
## 1       Left 0.69424825   Left
## 2     Stayed 0.01176851 Stayed
## 3     Stayed 0.65325062   Left
## 4       Left 0.39006665 Stayed
## 5     Stayed 0.60879219   Left
## 6     Stayed 0.58609442   Left
## 7     Stayed 0.20657558 Stayed
## 8     Stayed 0.23000728 Stayed
## 9     Stayed 0.96820633   Left
## 10    Stayed 0.14285659 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    806   49
##     Left      303  164
##                                          
##                Accuracy : 0.7337         
##                  95% CI : (0.709, 0.7574)
##     No Information Rate : 0.8389         
##     P-Value [Acc > NIR] : 1              
##                                          
##                   Kappa : 0.3352         
##                                          
##  Mcnemar's Test P-Value : <2e-16         
##                                          
##             Sensitivity : 0.7700         
##             Specificity : 0.7268         
##          Pos Pred Value : 0.3512         
##          Neg Pred Value : 0.9427         
##              Prevalence : 0.1611         
##          Detection Rate : 0.1241         
##    Detection Prevalence : 0.3533         
##       Balanced Accuracy : 0.7484         
##                                          
##        'Positive' Class : Left           
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.82

feature_imp(glm_mod_7)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 7.129e-16 ***
## Department               2   15.516      3082    1083.88 1.678e-10 ***
## Education                4    4.788      3078    1079.09 0.0076488 ** 
## JobLevel                 4    3.267      3074    1075.82 0.0502004 .  
## JobRole                  8    6.841      3066    1068.98 0.0109263 *  
## MaritalStatus            2   29.344      3064    1039.64 < 2.2e-16 ***
## MonthlyIncome            1    1.132      3063    1038.51 0.0699732 .  
## NumCompaniesWorked       1   21.811      3062    1016.69 1.792e-15 ***
## PercentSalaryHike        1    1.034      3061    1015.66 0.0832511 .  
## TotalWorkingYears        1   15.258      3060    1000.40 2.862e-11 ***
## TrainingTimesLastYear    1    6.287      3059     994.12 1.948e-05 ***
## YearsSinceLastPromotion  1    7.198      3058     986.92 4.878e-06 ***
## YearsWithCurrManager     1   22.859      3057     964.06 3.832e-16 ***
## EnvironmentSatisfaction  4   27.905      3053     936.15 < 2.2e-16 ***
## JobSatisfaction          4   20.346      3049     915.81 4.639e-12 ***
## WorkLifeBalance          4   12.290      3045     903.52 3.405e-07 ***
## JobInvolvement           3    6.648      3042     896.87 0.0002384 ***
## AvgHrs                   1   52.974      3041     843.90 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 7 Performance

Accuracy 0.7337
Sensitivity 0.7700
Specificity 0.7268
AUC 0.82


Model 8. Remove PercentSalaryHike

glm_mod_8 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  # EducationField + 
                  # Gender + 
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  # PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  # YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  # PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)

dLM_test$probs= predict(glm_mod_8, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition      probs  score
## 1       Left 0.64931169   Left
## 2     Stayed 0.01295795 Stayed
## 3     Stayed 0.60477348   Left
## 4       Left 0.42582268 Stayed
## 5     Stayed 0.63026618   Left
## 6     Stayed 0.57631750   Left
## 7     Stayed 0.20683100 Stayed
## 8     Stayed 0.25633616 Stayed
## 9     Stayed 0.95979072   Left
## 10    Stayed 0.16044485 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    806   51
##     Left      303  162
##                                           
##                Accuracy : 0.7322          
##                  95% CI : (0.7075, 0.7559)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3297          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7606          
##             Specificity : 0.7268          
##          Pos Pred Value : 0.3484          
##          Neg Pred Value : 0.9405          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1225          
##    Detection Prevalence : 0.3517          
##       Balanced Accuracy : 0.7437          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.82

feature_imp(glm_mod_8)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 6.842e-15 ***
## Department               2   15.516      3082    1083.88 7.221e-10 ***
## Education                4    4.788      3078    1079.09  0.011320 *  
## JobLevel                 4    3.267      3074    1075.82  0.064604 .  
## JobRole                  8    6.841      3066    1068.98  0.017391 *  
## MaritalStatus            2   29.344      3064    1039.64 < 2.2e-16 ***
## MonthlyIncome            1    1.132      3063    1038.51  0.079714 .  
## NumCompaniesWorked       1   21.811      3062    1016.69 1.440e-14 ***
## TotalWorkingYears        1   15.497      3061    1001.20 8.908e-11 ***
## TrainingTimesLastYear    1    6.497      3060     994.70 2.687e-05 ***
## YearsSinceLastPromotion  1    7.185      3059     987.52 1.009e-05 ***
## YearsWithCurrManager     1   22.327      3058     965.19 7.073e-15 ***
## EnvironmentSatisfaction  4   27.119      3054     938.07 3.979e-15 ***
## JobSatisfaction          4   19.733      3050     918.34 6.569e-11 ***
## WorkLifeBalance          4   12.670      3046     905.67 6.241e-07 ***
## JobInvolvement           3    6.459      3043     899.21  0.000551 ***
## AvgHrs                   1   53.772      3042     845.44 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 8 Performance

Accuracy 0.7322
Sensitivity 0.7606
Specificity 0.7268
AUC 0.82


Model 9. Remove JobLevel

glm_mod_9 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  # EducationField + 
                  # Gender + 
                  # JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  # PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  # YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  # PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)

dLM_test$probs= predict(glm_mod_9, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition      probs  score
## 1       Left 0.66321747   Left
## 2     Stayed 0.01608664 Stayed
## 3     Stayed 0.53482812   Left
## 4       Left 0.42939576 Stayed
## 5     Stayed 0.64524716   Left
## 6     Stayed 0.53042227   Left
## 7     Stayed 0.23004766 Stayed
## 8     Stayed 0.27717838 Stayed
## 9     Stayed 0.96519541   Left
## 10    Stayed 0.17150777 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    800   53
##     Left      309  160
##                                           
##                Accuracy : 0.7262          
##                  95% CI : (0.7013, 0.7501)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3181          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7512          
##             Specificity : 0.7214          
##          Pos Pred Value : 0.3412          
##          Neg Pred Value : 0.9379          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1210          
##    Detection Prevalence : 0.3548          
##       Balanced Accuracy : 0.7363          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.818

feature_imp(glm_mod_9)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 1.761e-14 ***
## Department               2   15.516      3082    1083.88 1.329e-09 ***
## Education                4    4.788      3078    1079.09 0.0133214 *  
## JobRole                  8    6.725      3070    1072.36 0.0234434 *  
## MaritalStatus            2   28.611      3068    1043.75 < 2.2e-16 ***
## MonthlyIncome            1    0.806      3067    1042.95 0.1450764    
## NumCompaniesWorked       1   21.909      3066    1021.04 3.019e-14 ***
## TotalWorkingYears        1   14.403      3065    1006.64 7.268e-10 ***
## TrainingTimesLastYear    1    6.347      3064    1000.29 4.326e-05 ***
## YearsSinceLastPromotion  1    7.319      3063     992.97 1.127e-05 ***
## YearsWithCurrManager     1   21.784      3062     971.18 3.570e-14 ***
## EnvironmentSatisfaction  4   26.360      3058     944.82 2.970e-14 ***
## JobSatisfaction          4   20.645      3054     924.18 4.357e-11 ***
## WorkLifeBalance          4   12.694      3050     911.49 9.693e-07 ***
## JobInvolvement           3    6.938      3047     904.55 0.0003851 ***
## AvgHrs                   1   53.177      3046     851.37 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 9 Performance

Accuracy 0.7262
Sensitivity 0.7512
Specificity 0.7214
AUC 0.818


Model 10. Remove MonthlyIncome

glm_mod_10 = glm(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  # DistanceFromHome + 
                  Education +
                  # EducationField + 
                  # Gender + 
                  # JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  # MonthlyIncome + 
                  NumCompaniesWorked + 
                  # PercentSalaryHike + 
                  # StockOptionLevel + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  # YearsAtCompany + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  # PerformanceRating + 
                  AvgHrs, 
                data = dLM_train,
                family = quasibinomial, 
                weights = weights)

dLM_test$probs= predict(glm_mod_10, newdata=dLM_test, type = 'response')
dLM_test = score_model(dLM_test, 0.5)
dLM_test[1:10, c('Attrition','probs','score')]
##    Attrition      probs  score
## 1       Left 0.65830003   Left
## 2     Stayed 0.01630002 Stayed
## 3     Stayed 0.52236878   Left
## 4       Left 0.42621947 Stayed
## 5     Stayed 0.63128309   Left
## 6     Stayed 0.55196791   Left
## 7     Stayed 0.22995908 Stayed
## 8     Stayed 0.28095043 Stayed
## 9     Stayed 0.97006920   Left
## 10    Stayed 0.16200878 Stayed
perf_met(dLM_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    801   54
##     Left      308  159
##                                           
##                Accuracy : 0.7262          
##                  95% CI : (0.7013, 0.7501)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3164          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7465          
##             Specificity : 0.7223          
##          Pos Pred Value : 0.3405          
##          Neg Pred Value : 0.9368          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1203          
##    Detection Prevalence : 0.3533          
##       Balanced Accuracy : 0.7344          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.817

feature_imp(glm_mod_10)
## Analysis of Deviance Table
## 
## Model: quasibinomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                     3087    1154.38              
## Age                      1   30.941      3086    1123.44 < 2.2e-16 ***
## BusinessTravel           2   24.042      3084    1099.39 3.320e-14 ***
## Department               2   15.516      3082    1083.88 2.001e-09 ***
## Education                4    4.788      3078    1079.09 0.0148543 *  
## JobRole                  8    6.725      3070    1072.36 0.0265436 *  
## MaritalStatus            2   28.611      3068    1043.75 < 2.2e-16 ***
## NumCompaniesWorked       1   22.088      3067    1021.67 4.300e-14 ***
## TotalWorkingYears        1   14.476      3066    1007.19 9.747e-10 ***
## TrainingTimesLastYear    1    6.496      3065    1000.69 4.217e-05 ***
## YearsSinceLastPromotion  1    7.115      3064     993.58 1.820e-05 ***
## YearsWithCurrManager     1   21.761      3063     971.82 6.603e-14 ***
## EnvironmentSatisfaction  4   26.480      3059     945.34 5.021e-14 ***
## JobSatisfaction          4   20.427      3055     924.91 9.670e-11 ***
## WorkLifeBalance          4   12.780      3051     912.13 1.197e-06 ***
## JobInvolvement           3    6.967      3048     905.17 0.0004427 ***
## AvgHrs                   1   52.884      3047     852.28 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Model 10 Performance

Accuracy 0.7262
Sensitivity 0.7465
Specificity 0.7223
AUC 0.818


The model performance measured by sensitivity seems to deteriorate with further feature removal after Model 7. Also starting from Model 7’s iteration onward, the remaining variables were shown to be statistically significant. Therefore I have determined that Model 7 to be my most predictive model.

Subsequently we will look at whether the threshold of 0.5 is appropriate. If the imbalance correction of the label features was successful, then this value would not need to be changed significantly.

test_threshold = function(test, threshold){
    test$score = predict(glm_mod_7, newdata = test, type = 'response')
    test = score_model(test, t)
    cat('\n')
    cat(paste('For threshold = ', as.character(threshold), '\n'))
    print(perf_met(test))
}

thresholds = seq(0.1, 0.9, by = 0.1)
for(t in thresholds) test_threshold(dLM_test, t) # Iterate over the thresholds
## 
## For threshold =  0.1 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    209    4
##     Left      900  209
##                                          
##                Accuracy : 0.3162         
##                  95% CI : (0.2912, 0.342)
##     No Information Rate : 0.8389         
##     P-Value [Acc > NIR] : 1              
##                                          
##                   Kappa : 0.0629         
##                                          
##  Mcnemar's Test P-Value : <2e-16         
##                                          
##             Sensitivity : 0.9812         
##             Specificity : 0.1885         
##          Pos Pred Value : 0.1885         
##          Neg Pred Value : 0.9812         
##              Prevalence : 0.1611         
##          Detection Rate : 0.1581         
##    Detection Prevalence : 0.8389         
##       Balanced Accuracy : 0.5848         
##                                          
##        'Positive' Class : Left           
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases
## AUC       = 0.817
## 
## For threshold =  0.2 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    404   10
##     Left      705  203
##                                          
##                Accuracy : 0.4592         
##                  95% CI : (0.432, 0.4865)
##     No Information Rate : 0.8389         
##     P-Value [Acc > NIR] : 1              
##                                          
##                   Kappa : 0.1369         
##                                          
##  Mcnemar's Test P-Value : <2e-16         
##                                          
##             Sensitivity : 0.9531         
##             Specificity : 0.3643         
##          Pos Pred Value : 0.2236         
##          Neg Pred Value : 0.9758         
##              Prevalence : 0.1611         
##          Detection Rate : 0.1536         
##    Detection Prevalence : 0.6868         
##       Balanced Accuracy : 0.6587         
##                                          
##        'Positive' Class : Left           
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817
## 
## For threshold =  0.3 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    553   26
##     Left      556  187
##                                           
##                Accuracy : 0.5598          
##                  95% CI : (0.5325, 0.5867)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.1878          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.8779          
##             Specificity : 0.4986          
##          Pos Pred Value : 0.2517          
##          Neg Pred Value : 0.9551          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1415          
##    Detection Prevalence : 0.5620          
##       Balanced Accuracy : 0.6883          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817
## 
## For threshold =  0.4 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    684   35
##     Left      425  178
##                                           
##                Accuracy : 0.652           
##                  95% CI : (0.6257, 0.6777)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.2601          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.8357          
##             Specificity : 0.6168          
##          Pos Pred Value : 0.2952          
##          Neg Pred Value : 0.9513          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1346          
##    Detection Prevalence : 0.4561          
##       Balanced Accuracy : 0.7262          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817
## 
## For threshold =  0.5 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    801   54
##     Left      308  159
##                                           
##                Accuracy : 0.7262          
##                  95% CI : (0.7013, 0.7501)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3164          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7465          
##             Specificity : 0.7223          
##          Pos Pred Value : 0.3405          
##          Neg Pred Value : 0.9368          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1203          
##    Detection Prevalence : 0.3533          
##       Balanced Accuracy : 0.7344          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817
## 
## For threshold =  0.6 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    913   70
##     Left      196  143
##                                           
##                Accuracy : 0.7988          
##                  95% CI : (0.7761, 0.8201)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 0.9999          
##                                           
##                   Kappa : 0.3992          
##                                           
##  Mcnemar's Test P-Value : 1.799e-14       
##                                           
##             Sensitivity : 0.6714          
##             Specificity : 0.8233          
##          Pos Pred Value : 0.4218          
##          Neg Pred Value : 0.9288          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1082          
##    Detection Prevalence : 0.2564          
##       Balanced Accuracy : 0.7473          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817
## 
## For threshold =  0.7 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed    992   96
##     Left      117  117
##                                           
##                Accuracy : 0.8389          
##                  95% CI : (0.8179, 0.8583)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 0.5183          
##                                           
##                   Kappa : 0.4268          
##                                           
##  Mcnemar's Test P-Value : 0.1706          
##                                           
##             Sensitivity : 0.5493          
##             Specificity : 0.8945          
##          Pos Pred Value : 0.5000          
##          Neg Pred Value : 0.9118          
##              Prevalence : 0.1611          
##          Detection Rate : 0.0885          
##    Detection Prevalence : 0.1770          
##       Balanced Accuracy : 0.7219          
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817
## 
## For threshold =  0.8 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed   1055  143
##     Left       54   70
##                                           
##                Accuracy : 0.851           
##                  95% CI : (0.8306, 0.8698)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 0.1225          
##                                           
##                   Kappa : 0.3368          
##                                           
##  Mcnemar's Test P-Value : 3.617e-10       
##                                           
##             Sensitivity : 0.32864         
##             Specificity : 0.95131         
##          Pos Pred Value : 0.56452         
##          Neg Pred Value : 0.88063         
##              Prevalence : 0.16112         
##          Detection Rate : 0.05295         
##    Detection Prevalence : 0.09380         
##       Balanced Accuracy : 0.63997         
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817
## 
## For threshold =  0.9 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed   1100  192
##     Left        9   21
##                                           
##                Accuracy : 0.848           
##                  95% CI : (0.8275, 0.8669)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 0.1954          
##                                           
##                   Kappa : 0.1386          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.09859         
##             Specificity : 0.99188         
##          Pos Pred Value : 0.70000         
##          Neg Pred Value : 0.85139         
##              Prevalence : 0.16112         
##          Detection Rate : 0.01589         
##    Detection Prevalence : 0.02269         
##       Balanced Accuracy : 0.54524         
##                                           
##        'Positive' Class : Left            
## 
## Setting levels: control = Stayed, case = Left
## Setting direction: controls < cases

## AUC       = 0.817

This test indicates that a threshold of 0.5 is adequate for our model.

3. Cross Validation

Using our final model glm_mod_7, I will perform a repeated cross-validation of the entire dataset to examine whether this model is generalized well.

weights = ifelse(dt$Attrition == 'Left', 0.84, 0.16)

control <- trainControl(method = "repeatedcv",
                        number = 5,
                        repeats = 3,
                        returnResamp ="all",
                        savePredictions = TRUE, 
                        classProbs = TRUE,
                        summaryFunction = twoClassSummary)

set.seed(1955)
glm_final <- train(Attrition ~ 
                  Age + 
                  BusinessTravel + 
                  Department + 
                  Education +
                  JobLevel + 
                  JobRole + 
                  MaritalStatus + 
                  MonthlyIncome + 
                  NumCompaniesWorked + 
                  PercentSalaryHike + 
                  TotalWorkingYears +
                  TrainingTimesLastYear + 
                  YearsSinceLastPromotion +
                  YearsWithCurrManager + 
                  EnvironmentSatisfaction + 
                  JobSatisfaction +
                  WorkLifeBalance +
                  JobInvolvement + 
                  AvgHrs, 
                data=dt, 
                method="glm", 
                metric= "Recall",
                weights = weights,
                trControl=control)

glm_final
## Generalized Linear Model 
## 
## 4410 samples
##   19 predictor
##    2 classes: 'Stayed', 'Left' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 3 times) 
## Summary of sample sizes: 3527, 3528, 3529, 3528, 3528, 3528, ... 
## Resampling results:
## 
##   ROC        Sens       Spec     
##   0.8195934  0.7368653  0.7627663


While the average of the metric sensitivity seems to have degraded slightly, this is as expected due to multiple re-sampling for cross validation. I conclude that our final Logistic Regression model is able to generalize reasonably well.


Next, let’s take a look at an emsemble classification model, Random Forest.

B. Random Forest

Create a copy of the partitioned, scaled datasets as before.

dRF_train <- dt_train
dRF_test <- dt_test
head(dRF_train)
##          Age Attrition    BusinessTravel             Department
## 1  1.5351693    Stayed     Travel-Rarely                  Sales
## 3 -0.5410709    Stayed Travel-Frequently Research & Development
## 5 -0.5410709    Stayed     Travel-Rarely Research & Development
## 6  0.9887903    Stayed     Travel-Rarely Research & Development
## 7 -0.9781741      Left     Travel-Rarely Research & Development
## 9 -0.6503467    Stayed     Travel-Rarely Research & Development
##   DistanceFromHome     Education EducationField Gender JobLevel
## 1      -0.39949765       College  Life Sciences Female        1
## 3       0.94741412        Master          Other   Male        4
## 5       0.09028845 Below College        Medical   Male        1
## 6      -0.15460460      Bachelor  Life Sciences Female        4
## 7       0.21273497       College        Medical   Male        2
## 9      -1.01173027      Bachelor  Life Sciences   Male        3
##                     JobRole MaritalStatus MonthlyIncome NumCompaniesWorked
## 1 Healthcare Representative       Married     1.4088205         -0.6721110
## 3           Sales Executive       Married     2.7295641         -0.6721110
## 5           Sales Executive        Single    -0.8818574          0.5352647
## 6         Research Director       Married    -0.5142519          0.1328061
## 7           Sales Executive        Single    -0.1438824         -0.2696525
## 9     Laboratory Technician       Married    -0.9452157         -1.0745696
##   PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear
## 1       -1.15935900                0        -1.3227615             2.5011973
## 3       -0.06678233                3        -0.8096420            -0.6088076
## 5       -0.88621483                2        -0.2965226            -0.6088076
## 6       -0.61307067                0         2.1407948             1.7236961
## 7        1.29893850                1        -0.8096420            -0.6088076
## 9        1.57208267                0        -0.1682427            -0.6088076
##   YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
## 1    -0.97857482              -0.6824716          -1.16099726
## 3    -0.33730174              -0.6824716          -0.32872313
## 5    -0.17698348              -0.6824716          -0.05129842
## 6    -0.01666521               1.4623824           0.78097571
## 7    -1.13889309              -0.6824716          -1.16099726
## 9     0.30397133               1.4623824           1.05840042
##   EnvironmentSatisfaction JobSatisfaction WorkLifeBalance JobInvolvement
## 1                    High       Very High            Good           High
## 3                  Medium          Medium             Bad           High
## 5               Very High             Low          Better           High
## 6                    High          Medium            Good           High
## 7                     Low            High             Bad           High
## 9                  Medium       Very High          Better           High
##   PerformanceRating     AvgHrs
## 1         Excellent -0.2538199
## 3         Excellent -0.5216240
## 5         Excellent  0.2222764
## 6         Excellent  2.2977583
## 7       Outstanding -0.5885750
## 9       Outstanding -0.3505269
head(dRF_test)
##           Age Attrition    BusinessTravel             Department
## 2  -0.6503467      Left Travel-Frequently Research & Development
## 4   0.1145839    Stayed        Non-Travel Research & Development
## 8  -0.8688983    Stayed     Travel-Rarely Research & Development
## 14  1.0980661      Left        Non-Travel Research & Development
## 17 -1.7431047    Stayed     Travel-Rarely Research & Development
## 21 -1.1967257    Stayed Travel-Frequently Research & Development
##    DistanceFromHome     Education EducationField Gender JobLevel
## 2        0.09028845 Below College  Life Sciences Female        1
## 4       -0.88928374        Doctor  Life Sciences   Male        3
## 8        1.06986064      Bachelor  Life Sciences   Male        2
## 14      -1.01173027 Below College        Medical   Male        1
## 17      -0.76683722       College  Life Sciences   Male        1
## 21      -1.01173027        Master          Other   Male        2
##                  JobRole MaritalStatus MonthlyIncome NumCompaniesWorked
## 2     Research Scientist        Single    -0.4891637         -1.0745696
## 4        Human Resources       Married     0.3893476          0.1328061
## 8        Sales Executive       Married    -0.7115555         -0.2696525
## 14    Research Scientist       Married    -0.1547256         -0.6721110
## 17 Laboratory Technician        Single    -0.4840610         -0.6721110
## 21 Laboratory Technician      Divorced     0.8413600         -0.6721110
##    PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear
## 2          2.1183710                1        -0.6813621             0.1686936
## 4         -1.1593590                3         0.2165969             1.7236961
## 8          1.8452268                3        -0.1682427            -0.6088076
## 14        -1.1593590                2        -0.1682427             0.9461948
## 17        -0.8862148                3        -1.0662017             0.1686936
## 21         0.7526502                0        -0.6813621             0.1686936
##    YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
## 2      -0.3373017              -0.3760639          -0.05129842
## 4       0.1436531               1.4623824           0.22612629
## 8      -1.1388931              -0.6824716          -1.16099726
## 14      0.4642896               2.0751978           1.33582514
## 17     -0.6579383              -0.3760639          -1.16099726
## 21     -0.1769835              -0.3760639          -0.05129842
##    EnvironmentSatisfaction JobSatisfaction WorkLifeBalance JobInvolvement
## 2                     High          Medium            Best         Medium
## 4                Very High       Very High          Better         Medium
## 8                      Low          Medium          Better           High
## 14                     Low          Medium            Good         Medium
## 17               Very High            High            Best         Medium
## 21                    High          Medium             Bad           High
##    PerformanceRating       AvgHrs
## 2        Outstanding  0.006545263
## 4          Excellent -0.387721916
## 8        Outstanding -0.729916072
## 14         Excellent  1.256297831
## 17         Excellent -0.811745109
## 21         Excellent -0.090161781


Modify the performance metric function because Random Forest does not compute probabilities.

perf_met <- function(df) {
  #Confusion Matrix Summary
  cm <- suppressWarnings(confusionMatrix(data = as.factor(df$score), 
                                         reference = as.factor(df$Attrition), 
                                         positive = "Left"))
  print(cm)

  table <- data.frame(cm$table)
  
  plotTable <- table %>%
    mutate(Correctness = ifelse(table$Prediction == table$Reference, "Correct", "Incorrect")) %>%
    group_by(Reference) %>%
    mutate(Proportion = Freq/sum(Freq))
  
  # Fill alpha relative to sensitivity/specificity by proportional outcomes within reference groups 
  ggplot(data = plotTable, 
         mapping = aes(x=Reference, y=Prediction, fill=Correctness, alpha=Proportion)) + 
      geom_tile() +
      geom_text(aes(label=Freq), vjust=.5, fontface="bold", alpha=1) +
      scale_fill_manual(values = c(Correct="#264d73", Incorrect="#b30000")) +
      xlim(rev(levels(table$Reference))) +
      ylim(levels(table$Prediction)) +
      theme_light()
}

## Function to show  which features are important.
feature_imp= function(mod) {
    imp = varImp(mod)
    
    plot <- ggplot(imp, aes(x=reorder(rownames(imp),Overall), y=Overall)) +
        geom_point(color="skyblue", size=2, alpha=0.8) +
        geom_segment(aes(x=rownames(imp), xend=rownames(imp), y=0, yend=Overall), color='skyblue') +
        xlab('Variable') + 
        ylab('Overall Importance') +
        theme_light() +
        coord_flip() 
  print(plot)
}

1. Perform a baseline Random Forest model

rf_mod <- randomForest(Attrition ~ ., 
                       data=dRF_train
                       )
print(rf_mod)
## 
## Call:
##  randomForest(formula = Attrition ~ ., data = dRF_train) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##         OOB estimate of  error rate: 1.55%
## Confusion matrix:
##        Stayed Left class.error
## Stayed   2585    5 0.001930502
## Left       43  455 0.086345382
dRF_test$score = predict(rf_mod, newdata = dRF_test)
dRF_test[1:10, c('Attrition','score')]
##    Attrition  score
## 2       Left   Left
## 4     Stayed Stayed
## 8     Stayed Stayed
## 14      Left   Left
## 17    Stayed Stayed
## 21    Stayed Stayed
## 24    Stayed Stayed
## 25    Stayed Stayed
## 27    Stayed Stayed
## 28    Stayed Stayed
perf_met(dRF_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed   1109   12
##     Left        0  201
##                                           
##                Accuracy : 0.9909          
##                  95% CI : (0.9842, 0.9953)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9656          
##                                           
##  Mcnemar's Test P-Value : 0.001496        
##                                           
##             Sensitivity : 0.9437          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9893          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1520          
##    Detection Prevalence : 0.1520          
##       Balanced Accuracy : 0.9718          
##                                           
##        'Positive' Class : Left            
## 

feature_imp(rf_mod)

Insights

  • This baseline Random Forest model performed extremely well across all metrics, to the extent that I was incredulous of the result. For example, the accuracy was over 99% while sensitivity was 94% and specificity was 1. This led me to conjecture that this dataset is not anonymized real data, but actually manufactured for the purposes of education or exercise, perhaps particularly for Logistic Regression. This is also supported by the fact that even the raw datasets were unusually polished.
  • Even though this Random Forest model is already close to perfection, I will continue with feature selection and model tuning to showcase what should be the next steps.
  • Gender and PerformanceRating again ranked lowest in variable importance. Therefore I will remove these features in the following modeling.
  • Random Forest is regarded to deal well with feature imbalance, as seen by high specificity as well as sensitivity.


2. Model Tuning and Cross Validation

In order to optimize both mtry and ntree hyperparameters in conjunction, I created a custom Random Forest method to be used with caret package’s train() function.

## Create custom training algorithm that tunes both mtry and ntree together.
customRF <- list(type = "Classification", library = "randomForest", loop = NULL)
customRF$parameters <- data.frame(parameter = c("mtry", "ntree"), 
                                  class = rep("numeric", 2), 
                                  label = c("mtry", "ntree"))
customRF$grid <- function(x, y, len = NULL, search = "grid") {}
customRF$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...){
  randomForest(x, y, mtry = param$mtry, ntree=param$ntree, ...)}
customRF$predict <- function(modelFit, newdata, preProc = NULL, submodels = NULL){
   predict(modelFit, newdata)}
customRF$prob <- function(modelFit, newdata, preProc = NULL, submodels = NULL){
   predict(modelFit, newdata, type = "prob")}
customRF$sort <- function(x) {x[order(x[,1]),]}
customRF$levels <- function(x) {x$classes}

# Train model with customRF training algorithm.
weights = ifelse(dRF_train$Attrition == 'Left', 0.84, 0.16)

control <- trainControl(method = "repeatedcv",
                        number = 5,
                        repeats = 3,
                        search='grid',
                        returnResamp ="all",
                        savePredictions = TRUE, 
                        classProbs = TRUE,
                        summaryFunction = twoClassSummary)

# Hyperparameter grid
tunegrid <- expand.grid(.mtry=c(5:12), .ntree=c(101,501,1001,2001))

set.seed(1955)
myRF_mod <- train(Attrition ~ 
                         Age + 
                         BusinessTravel + 
                         Department + 
                         DistanceFromHome + 
                         Education +
                         EducationField + 
                        # Gender + 
                         JobLevel + 
                         JobRole + 
                         MaritalStatus + 
                         MonthlyIncome + 
                         NumCompaniesWorked + 
                         PercentSalaryHike + 
                         StockOptionLevel + 
                         TotalWorkingYears +
                         TrainingTimesLastYear + 
                         YearsAtCompany + 
                         YearsSinceLastPromotion +
                         YearsWithCurrManager + 
                         EnvironmentSatisfaction + 
                         JobSatisfaction +
                         WorkLifeBalance +
                         JobInvolvement + 
                        # PerformanceRating + 
                         AvgHrs,
                data=dRF_train, 
                method=customRF, 
                metric= "Sens",
                tuneGrid=tunegrid, 
                trControl=control)

plot(myRF_mod) 

dRF_test$scores = predict(myRF_mod, newdata = dRF_test)
perf_met(dRF_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed   1109   12
##     Left        0  201
##                                           
##                Accuracy : 0.9909          
##                  95% CI : (0.9842, 0.9953)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9656          
##                                           
##  Mcnemar's Test P-Value : 0.001496        
##                                           
##             Sensitivity : 0.9437          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9893          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1520          
##    Detection Prevalence : 0.1520          
##       Balanced Accuracy : 0.9718          
##                                           
##        'Positive' Class : Left            
## 

myRF_mod
## 3088 samples
##   23 predictor
##    2 classes: 'Stayed', 'Left' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 3 times) 
## Summary of sample sizes: 2470, 2470, 2471, 2471, 2470, 2471, ... 
## Resampling results across tuning parameters:
## 
##   mtry  ntree  ROC        Sens       Spec     
##    5     101   0.9831072  0.9989704  0.7884781
##    5     501   0.9858427  0.9988417  0.7925118
##    5    1001   0.9865139  0.9987130  0.7904848
##    5    2001   0.9861856  0.9988417  0.7898249
##    6     101   0.9814024  0.9983269  0.8011852
##    6     501   0.9849812  0.9985843  0.7924983
##    6    1001   0.9850839  0.9985843  0.7971987
##    6    2001   0.9857870  0.9988417  0.7965118
##    7     101   0.9840768  0.9976834  0.7985185
##    7     501   0.9853648  0.9980695  0.7945118
##    7    1001   0.9853906  0.9987130  0.7918384
##    7    2001   0.9853033  0.9983269  0.7951852
##    8     101   0.9825889  0.9975547  0.7991717
##    8     501   0.9845302  0.9979408  0.7991785
##    8    1001   0.9850600  0.9980695  0.7965185
##    8    2001   0.9849888  0.9979408  0.8005253
##    9     101   0.9838705  0.9980695  0.8058586
##    9     501   0.9831738  0.9979408  0.7971717
##    9    1001   0.9836694  0.9980695  0.8011987
##    9    2001   0.9837432  0.9976834  0.7978384
##   10     101   0.9838431  0.9974260  0.8038249
##   10     501   0.9833145  0.9979408  0.7971515
##   10    1001   0.9837660  0.9978121  0.8045320
##   10    2001   0.9836086  0.9979408  0.7971785
##   11     101   0.9801069  0.9978121  0.8052256
##   11     501   0.9833862  0.9975547  0.7998586
##   11    1001   0.9831305  0.9975547  0.8018653
##   11    2001   0.9834334  0.9979408  0.8031987
##   12     101   0.9815505  0.9972973  0.8058519
##   12     501   0.9824013  0.9974260  0.8045387
##   12    1001   0.9822733  0.9976834  0.8045387
##   12    2001   0.9827472  0.9978121  0.8052189
## 
## Sens was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 5 and ntree = 101.


Insights

  • The best performing random forest model used mtry = 5 and ntree = 101.
  • Removal of the two lowest importance variable did not change the performance significantly, indicating it should be disregarded.


The final model for comparison is the Neural Network algorithm.



C. Neural Network

Create a copy of the partitioned, scaled datasets as before.

dNN_train <- dt_train
dNN_test <- dt_test
head(dNN_train)
##          Age Attrition    BusinessTravel             Department
## 1  1.5351693    Stayed     Travel-Rarely                  Sales
## 3 -0.5410709    Stayed Travel-Frequently Research & Development
## 5 -0.5410709    Stayed     Travel-Rarely Research & Development
## 6  0.9887903    Stayed     Travel-Rarely Research & Development
## 7 -0.9781741      Left     Travel-Rarely Research & Development
## 9 -0.6503467    Stayed     Travel-Rarely Research & Development
##   DistanceFromHome     Education EducationField Gender JobLevel
## 1      -0.39949765       College  Life Sciences Female        1
## 3       0.94741412        Master          Other   Male        4
## 5       0.09028845 Below College        Medical   Male        1
## 6      -0.15460460      Bachelor  Life Sciences Female        4
## 7       0.21273497       College        Medical   Male        2
## 9      -1.01173027      Bachelor  Life Sciences   Male        3
##                     JobRole MaritalStatus MonthlyIncome NumCompaniesWorked
## 1 Healthcare Representative       Married     1.4088205         -0.6721110
## 3           Sales Executive       Married     2.7295641         -0.6721110
## 5           Sales Executive        Single    -0.8818574          0.5352647
## 6         Research Director       Married    -0.5142519          0.1328061
## 7           Sales Executive        Single    -0.1438824         -0.2696525
## 9     Laboratory Technician       Married    -0.9452157         -1.0745696
##   PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear
## 1       -1.15935900                0        -1.3227615             2.5011973
## 3       -0.06678233                3        -0.8096420            -0.6088076
## 5       -0.88621483                2        -0.2965226            -0.6088076
## 6       -0.61307067                0         2.1407948             1.7236961
## 7        1.29893850                1        -0.8096420            -0.6088076
## 9        1.57208267                0        -0.1682427            -0.6088076
##   YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
## 1    -0.97857482              -0.6824716          -1.16099726
## 3    -0.33730174              -0.6824716          -0.32872313
## 5    -0.17698348              -0.6824716          -0.05129842
## 6    -0.01666521               1.4623824           0.78097571
## 7    -1.13889309              -0.6824716          -1.16099726
## 9     0.30397133               1.4623824           1.05840042
##   EnvironmentSatisfaction JobSatisfaction WorkLifeBalance JobInvolvement
## 1                    High       Very High            Good           High
## 3                  Medium          Medium             Bad           High
## 5               Very High             Low          Better           High
## 6                    High          Medium            Good           High
## 7                     Low            High             Bad           High
## 9                  Medium       Very High          Better           High
##   PerformanceRating     AvgHrs
## 1         Excellent -0.2538199
## 3         Excellent -0.5216240
## 5         Excellent  0.2222764
## 6         Excellent  2.2977583
## 7       Outstanding -0.5885750
## 9       Outstanding -0.3505269
head(dNN_test)
##           Age Attrition    BusinessTravel             Department
## 2  -0.6503467      Left Travel-Frequently Research & Development
## 4   0.1145839    Stayed        Non-Travel Research & Development
## 8  -0.8688983    Stayed     Travel-Rarely Research & Development
## 14  1.0980661      Left        Non-Travel Research & Development
## 17 -1.7431047    Stayed     Travel-Rarely Research & Development
## 21 -1.1967257    Stayed Travel-Frequently Research & Development
##    DistanceFromHome     Education EducationField Gender JobLevel
## 2        0.09028845 Below College  Life Sciences Female        1
## 4       -0.88928374        Doctor  Life Sciences   Male        3
## 8        1.06986064      Bachelor  Life Sciences   Male        2
## 14      -1.01173027 Below College        Medical   Male        1
## 17      -0.76683722       College  Life Sciences   Male        1
## 21      -1.01173027        Master          Other   Male        2
##                  JobRole MaritalStatus MonthlyIncome NumCompaniesWorked
## 2     Research Scientist        Single    -0.4891637         -1.0745696
## 4        Human Resources       Married     0.3893476          0.1328061
## 8        Sales Executive       Married    -0.7115555         -0.2696525
## 14    Research Scientist       Married    -0.1547256         -0.6721110
## 17 Laboratory Technician        Single    -0.4840610         -0.6721110
## 21 Laboratory Technician      Divorced     0.8413600         -0.6721110
##    PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear
## 2          2.1183710                1        -0.6813621             0.1686936
## 4         -1.1593590                3         0.2165969             1.7236961
## 8          1.8452268                3        -0.1682427            -0.6088076
## 14        -1.1593590                2        -0.1682427             0.9461948
## 17        -0.8862148                3        -1.0662017             0.1686936
## 21         0.7526502                0        -0.6813621             0.1686936
##    YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
## 2      -0.3373017              -0.3760639          -0.05129842
## 4       0.1436531               1.4623824           0.22612629
## 8      -1.1388931              -0.6824716          -1.16099726
## 14      0.4642896               2.0751978           1.33582514
## 17     -0.6579383              -0.3760639          -1.16099726
## 21     -0.1769835              -0.3760639          -0.05129842
##    EnvironmentSatisfaction JobSatisfaction WorkLifeBalance JobInvolvement
## 2                     High          Medium            Best         Medium
## 4                Very High       Very High          Better         Medium
## 8                      Low          Medium          Better           High
## 14                     Low          Medium            Good         Medium
## 17               Very High            High            Best         Medium
## 21                    High          Medium             Bad           High
##    PerformanceRating       AvgHrs
## 2        Outstanding  0.006545263
## 4          Excellent -0.387721916
## 8        Outstanding -0.729916072
## 14         Excellent  1.256297831
## 17         Excellent -0.811745109
## 21         Excellent -0.090161781
perf_met <- function(df) {
  #Confusion Matrix Summary
  cm <- suppressWarnings(confusionMatrix(data = as.factor(df$score), 
                                         reference = as.factor(df$Attrition), 
                                         positive = "Left"))
  print(cm)

  table <- data.frame(cm$table)
  
  plotTable <- table %>%
    mutate(Correctness = ifelse(table$Prediction == table$Reference, "Correct", "Incorrect")) %>%
    group_by(Reference) %>%
    mutate(Proportion = Freq/sum(Freq))
  
  # Fill alpha relative to sensitivity/specificity by proportional outcomes within reference groups 
  ggplot(data = plotTable, 
         mapping = aes(x=Reference, y=Prediction, fill=Correctness, alpha=Proportion)) + 
      geom_tile() +
      geom_text(aes(label=Freq), vjust=.5, fontface="bold", alpha=1) +
      scale_fill_manual(values = c(Correct="#264d73", Incorrect="#b30000")) +
      xlim(rev(levels(table$Reference))) +
      ylim(levels(table$Prediction)) +
      theme_light()
}

## Function to show  which features are important.
feature_imp= function(mod) {
    imp = varImp(mod)
    
    plot <- ggplot(imp, aes(x=reorder(rownames(imp), Overall), y=Overall)) +
        geom_point(color="skyblue", size=2, alpha=0.8) +
        geom_segment(aes(x=rownames(imp), xend=rownames(imp), y=0, yend=Overall), color='skyblue') +
        xlab('Variable') + 
        ylab('Overall Importance') +
        theme_light() +
        coord_flip() 
  print(plot)
}
1. Baseline Neural Network Model using caret
nn_mod<- train(Attrition ~ .,
               data = dNN_train,  
               method = "nnet" )
## # weights:  63
## initial  value 2708.638435 
## iter  10 value 1186.350693
## iter  20 value 1069.462031
## iter  30 value 1011.774638
## iter  40 value 974.843920
## iter  50 value 951.724334
## iter  60 value 946.526260
## iter  70 value 946.513051
## final  value 946.512998 
## converged
## # weights:  187
## initial  value 2782.076469 
## iter  10 value 1368.501277
## iter  20 value 1300.541478
## iter  30 value 1226.148970
## iter  40 value 1162.375646
## iter  50 value 1137.780134
## iter  60 value 1114.653380
## iter  70 value 1097.589681
## iter  80 value 1091.582789
## iter  90 value 1053.853154
## iter 100 value 986.440765
## final  value 986.440765 
## stopped after 100 iterations
## # weights:  311
## initial  value 2200.118623 
## iter  10 value 1148.316073
## iter  20 value 922.714514
## iter  30 value 778.738367
## iter  40 value 686.184812
## iter  50 value 657.418196
## iter  60 value 632.466211
## iter  70 value 606.036621
## iter  80 value 590.154980
## iter  90 value 568.327155
## iter 100 value 554.764469
## final  value 554.764469 
## stopped after 100 iterations
## # weights:  63
## initial  value 2004.275011 
## iter  10 value 1394.164128
## iter  20 value 1134.750602
## iter  30 value 1063.812723
## iter  40 value 1014.404238
## iter  50 value 996.506050
## iter  60 value 995.447432
## iter  70 value 995.296835
## iter  80 value 995.008574
## iter  90 value 995.004145
## iter  90 value 995.004137
## iter  90 value 995.004137
## final  value 995.004137 
## converged
## # weights:  187
## initial  value 2462.425907 
## iter  10 value 1309.391764
## iter  20 value 1140.039296
## iter  30 value 992.368101
## iter  40 value 921.235070
## iter  50 value 845.313146
## iter  60 value 791.541421
## iter  70 value 750.612356
## iter  80 value 727.056030
## iter  90 value 708.909453
## iter 100 value 684.765113
## final  value 684.765113 
## stopped after 100 iterations
## # weights:  311
## initial  value 1729.845408 
## iter  10 value 1001.099514
## iter  20 value 766.051939
## iter  30 value 639.928089
## iter  40 value 570.949963
## iter  50 value 532.144887
## iter  60 value 495.812011
## iter  70 value 467.134801
## iter  80 value 444.812820
## iter  90 value 423.002959
## iter 100 value 411.161307
## final  value 411.161307 
## stopped after 100 iterations
## # weights:  63
## initial  value 2433.077205 
## iter  10 value 1177.314677
## iter  20 value 1062.391261
## iter  30 value 1003.295077
## iter  40 value 978.800558
## iter  50 value 956.886696
## iter  60 value 953.034911
## iter  70 value 952.582156
## iter  80 value 952.468848
## iter  90 value 952.421102
## iter 100 value 952.383807
## final  value 952.383807 
## stopped after 100 iterations
## # weights:  187
## initial  value 1910.774679 
## iter  10 value 1115.584731
## iter  20 value 969.102553
## iter  30 value 863.230477
## iter  40 value 772.029252
## iter  50 value 723.767095
## iter  60 value 703.494652
## iter  70 value 654.547158
## iter  80 value 640.524410
## iter  90 value 630.745175
## iter 100 value 620.734482
## final  value 620.734482 
## stopped after 100 iterations
## # weights:  311
## initial  value 1581.088647 
## iter  10 value 1017.416944
## iter  20 value 786.555568
## iter  30 value 647.192697
## iter  40 value 575.982658
## iter  50 value 506.423710
## iter  60 value 488.712344
## iter  70 value 478.367101
## iter  80 value 469.356205
## iter  90 value 461.928856
## iter 100 value 451.963190
## final  value 451.963190 
## stopped after 100 iterations
## # weights:  63
## initial  value 2883.478354 
## iter  10 value 1231.371902
## iter  20 value 1057.120917
## iter  30 value 980.012457
## iter  40 value 952.979321
## iter  50 value 921.467947
## iter  60 value 907.627038
## iter  70 value 904.695012
## iter  80 value 904.626354
## iter  90 value 904.622981
## final  value 904.622959 
## converged
## # weights:  187
## initial  value 1867.638300 
## iter  10 value 1027.227422
## iter  20 value 837.787438
## iter  30 value 730.890526
## iter  40 value 680.605311
## iter  50 value 634.890225
## iter  60 value 596.172360
## iter  70 value 572.737051
## iter  80 value 556.000186
## iter  90 value 545.409751
## iter 100 value 533.423873
## final  value 533.423873 
## stopped after 100 iterations
## # weights:  311
## initial  value 2867.687209 
## iter  10 value 1136.452469
## iter  20 value 900.990361
## iter  30 value 755.173149
## iter  40 value 640.664429
## iter  50 value 536.715701
## iter  60 value 483.164199
## iter  70 value 421.579344
## iter  80 value 399.325258
## iter  90 value 375.931971
## iter 100 value 337.237344
## final  value 337.237344 
## stopped after 100 iterations
## # weights:  63
## initial  value 2241.993147 
## iter  10 value 1188.590211
## iter  20 value 1026.915140
## iter  30 value 983.354843
## iter  40 value 967.319621
## iter  50 value 955.354536
## iter  60 value 952.100538
## iter  70 value 943.803833
## iter  80 value 936.339944
## iter  90 value 936.161097
## iter 100 value 936.142762
## final  value 936.142762 
## stopped after 100 iterations
## # weights:  187
## initial  value 2463.168753 
## iter  10 value 1061.855396
## iter  20 value 915.503779
## iter  30 value 829.200653
## iter  40 value 792.951683
## iter  50 value 780.129796
## iter  60 value 756.888828
## iter  70 value 745.598946
## iter  80 value 739.261573
## iter  90 value 732.246762
## iter 100 value 721.685505
## final  value 721.685505 
## stopped after 100 iterations
## # weights:  311
## initial  value 2177.278509 
## iter  10 value 931.509551
## iter  20 value 774.825749
## iter  30 value 654.353347
## iter  40 value 571.325618
## iter  50 value 524.431190
## iter  60 value 496.287707
## iter  70 value 477.214997
## iter  80 value 456.403202
## iter  90 value 435.690141
## iter 100 value 423.990404
## final  value 423.990404 
## stopped after 100 iterations
## # weights:  63
## initial  value 2374.062019 
## iter  10 value 1192.877433
## iter  20 value 1043.828915
## iter  30 value 947.058252
## iter  40 value 919.595576
## iter  50 value 903.057983
## iter  60 value 895.536127
## iter  70 value 895.082416
## iter  80 value 894.979659
## iter  90 value 894.967909
## iter 100 value 894.960911
## final  value 894.960911 
## stopped after 100 iterations
## # weights:  187
## initial  value 1928.390240 
## iter  10 value 1156.671793
## iter  20 value 938.276979
## iter  30 value 831.711575
## iter  40 value 770.274505
## iter  50 value 735.297772
## iter  60 value 720.794018
## iter  70 value 708.127206
## iter  80 value 703.395027
## iter  90 value 702.341118
## iter 100 value 701.583357
## final  value 701.583357 
## stopped after 100 iterations
## # weights:  311
## initial  value 2064.279324 
## iter  10 value 963.909366
## iter  20 value 770.766838
## iter  30 value 640.588330
## iter  40 value 513.226816
## iter  50 value 443.572210
## iter  60 value 399.024061
## iter  70 value 371.887458
## iter  80 value 355.335606
## iter  90 value 346.212677
## iter 100 value 344.767369
## final  value 344.767369 
## stopped after 100 iterations
## # weights:  63
## initial  value 2416.964133 
## iter  10 value 1164.563357
## iter  20 value 1034.958791
## iter  30 value 949.251356
## iter  40 value 927.806565
## iter  50 value 910.655049
## iter  60 value 882.329011
## iter  70 value 878.421002
## iter  80 value 878.382865
## final  value 878.382813 
## converged
## # weights:  187
## initial  value 1740.010867 
## iter  10 value 1032.185123
## iter  20 value 912.967878
## iter  30 value 791.978590
## iter  40 value 685.107047
## iter  50 value 638.546534
## iter  60 value 613.613400
## iter  70 value 581.470245
## iter  80 value 567.910879
## iter  90 value 558.437890
## iter 100 value 553.365829
## final  value 553.365829 
## stopped after 100 iterations
## # weights:  311
## initial  value 2414.148219 
## iter  10 value 953.437653
## iter  20 value 730.726461
## iter  30 value 527.456369
## iter  40 value 456.586739
## iter  50 value 406.330166
## iter  60 value 380.694730
## iter  70 value 365.436287
## iter  80 value 347.419984
## iter  90 value 329.070197
## iter 100 value 323.913902
## final  value 323.913902 
## stopped after 100 iterations
## # weights:  63
## initial  value 1862.878420 
## iter  10 value 1131.148240
## iter  20 value 1063.447534
## iter  30 value 1025.165090
## iter  40 value 964.005117
## iter  50 value 942.722499
## iter  60 value 935.827508
## iter  70 value 933.181687
## iter  80 value 932.145863
## iter  90 value 927.895573
## iter 100 value 927.382314
## final  value 927.382314 
## stopped after 100 iterations
## # weights:  187
## initial  value 2119.206487 
## iter  10 value 1184.497226
## iter  20 value 1056.041819
## iter  30 value 975.591203
## iter  40 value 895.975976
## iter  50 value 858.146169
## iter  60 value 838.811861
## iter  70 value 826.482491
## iter  80 value 801.938039
## iter  90 value 756.629703
## iter 100 value 716.459491
## final  value 716.459491 
## stopped after 100 iterations
## # weights:  311
## initial  value 3155.475897 
## iter  10 value 1297.053450
## iter  20 value 1109.425516
## iter  30 value 907.413970
## iter  40 value 756.199944
## iter  50 value 645.366918
## iter  60 value 572.641379
## iter  70 value 516.883239
## iter  80 value 485.398869
## iter  90 value 457.932545
## iter 100 value 443.217197
## final  value 443.217197 
## stopped after 100 iterations
## # weights:  63
## initial  value 2091.463400 
## iter  10 value 1084.655013
## iter  20 value 1010.035095
## iter  30 value 966.620808
## iter  40 value 951.590263
## iter  50 value 945.641236
## iter  60 value 933.098458
## iter  70 value 915.498229
## iter  80 value 913.525086
## iter  90 value 913.290737
## iter 100 value 913.247352
## final  value 913.247352 
## stopped after 100 iterations
## # weights:  187
## initial  value 1572.225000 
## iter  10 value 985.255304
## iter  20 value 785.861035
## iter  30 value 667.040761
## iter  40 value 618.890778
## iter  50 value 585.944739
## iter  60 value 568.090433
## iter  70 value 554.899163
## iter  80 value 553.083452
## iter  90 value 552.389704
## iter 100 value 551.358821
## final  value 551.358821 
## stopped after 100 iterations
## # weights:  311
## initial  value 2674.398377 
## iter  10 value 1253.145212
## iter  20 value 1053.651091
## iter  30 value 885.366891
## iter  40 value 747.423364
## iter  50 value 602.366600
## iter  60 value 522.582887
## iter  70 value 468.331814
## iter  80 value 436.021219
## iter  90 value 423.717127
## iter 100 value 410.836123
## final  value 410.836123 
## stopped after 100 iterations
## # weights:  63
## initial  value 2225.245702 
## iter  10 value 1229.443787
## iter  20 value 1109.699855
## iter  30 value 1033.737594
## iter  40 value 997.257556
## iter  50 value 966.696392
## iter  60 value 955.427202
## iter  70 value 954.931470
## final  value 954.928704 
## converged
## # weights:  187
## initial  value 2221.437657 
## iter  10 value 1224.970098
## iter  20 value 942.079145
## iter  30 value 779.649463
## iter  40 value 644.407274
## iter  50 value 603.664094
## iter  60 value 583.789823
## iter  70 value 572.616257
## iter  80 value 567.795824
## iter  90 value 562.429750
## iter 100 value 555.199251
## final  value 555.199251 
## stopped after 100 iterations
## # weights:  311
## initial  value 2385.578589 
## iter  10 value 1177.061918
## iter  20 value 989.131619
## iter  30 value 770.690988
## iter  40 value 651.448056
## iter  50 value 576.716572
## iter  60 value 538.631350
## iter  70 value 520.964496
## iter  80 value 507.803527
## iter  90 value 504.281565
## iter 100 value 500.780617
## final  value 500.780617 
## stopped after 100 iterations
## # weights:  63
## initial  value 1622.093809 
## iter  10 value 1120.250297
## iter  20 value 1034.393880
## iter  30 value 1013.484898
## iter  40 value 1010.464378
## iter  50 value 1010.374914
## iter  60 value 1010.347280
## iter  70 value 1010.322837
## final  value 1010.295600 
## converged
## # weights:  187
## initial  value 2147.819804 
## iter  10 value 1250.367967
## iter  20 value 1079.860795
## iter  30 value 966.880845
## iter  40 value 867.573373
## iter  50 value 807.991459
## iter  60 value 769.402202
## iter  70 value 756.305842
## iter  80 value 748.694755
## iter  90 value 738.500380
## iter 100 value 718.143538
## final  value 718.143538 
## stopped after 100 iterations
## # weights:  311
## initial  value 1454.291586 
## iter  10 value 945.241836
## iter  20 value 741.095067
## iter  30 value 629.028878
## iter  40 value 564.995920
## iter  50 value 530.495423
## iter  60 value 497.419013
## iter  70 value 474.533943
## iter  80 value 460.201402
## iter  90 value 446.932410
## iter 100 value 436.346780
## final  value 436.346780 
## stopped after 100 iterations
## # weights:  63
## initial  value 1950.654303 
## iter  10 value 1205.787281
## iter  20 value 1065.761607
## iter  30 value 987.884797
## iter  40 value 968.857084
## iter  50 value 938.966071
## iter  60 value 912.036553
## iter  70 value 909.363317
## iter  80 value 908.547926
## iter  90 value 908.065125
## iter 100 value 907.525531
## final  value 907.525531 
## stopped after 100 iterations
## # weights:  187
## initial  value 3472.181036 
## iter  10 value 1389.369931
## iter  20 value 1356.530798
## iter  30 value 1231.029959
## iter  40 value 1151.786695
## iter  50 value 1122.948722
## iter  60 value 1114.096487
## iter  70 value 1100.492078
## iter  80 value 1093.109948
## iter  90 value 1089.937428
## iter 100 value 1088.118736
## final  value 1088.118736 
## stopped after 100 iterations
## # weights:  311
## initial  value 2593.743516 
## iter  10 value 1110.618596
## iter  20 value 889.252223
## iter  30 value 716.031672
## iter  40 value 575.446333
## iter  50 value 513.259779
## iter  60 value 462.422059
## iter  70 value 425.787983
## iter  80 value 395.871487
## iter  90 value 381.584571
## iter 100 value 377.720532
## final  value 377.720532 
## stopped after 100 iterations
## # weights:  63
## initial  value 2649.389687 
## iter  10 value 1153.459135
## iter  20 value 1014.819581
## iter  30 value 969.103846
## iter  40 value 952.207848
## iter  50 value 903.852894
## iter  60 value 881.828393
## iter  70 value 880.584721
## final  value 880.580157 
## converged
## # weights:  187
## initial  value 1553.748538 
## iter  10 value 972.949973
## iter  20 value 823.879959
## iter  30 value 703.719683
## iter  40 value 621.044726
## iter  50 value 597.155031
## iter  60 value 583.271504
## iter  70 value 577.858264
## iter  80 value 576.661098
## iter  90 value 574.949463
## iter 100 value 572.010711
## final  value 572.010711 
## stopped after 100 iterations
## # weights:  311
## initial  value 2256.359123 
## iter  10 value 1132.701774
## iter  20 value 822.382035
## iter  30 value 694.693188
## iter  40 value 584.934352
## iter  50 value 549.878182
## iter  60 value 531.752804
## iter  70 value 522.534858
## iter  80 value 513.822827
## iter  90 value 503.823472
## iter 100 value 500.366278
## final  value 500.366278 
## stopped after 100 iterations
## # weights:  63
## initial  value 2258.189442 
## iter  10 value 1188.820072
## iter  20 value 1078.055216
## iter  30 value 1031.837301
## iter  40 value 1014.451402
## iter  50 value 1006.901051
## iter  60 value 1006.088255
## iter  70 value 1004.852501
## iter  80 value 997.353020
## iter  90 value 985.127063
## iter 100 value 980.315080
## final  value 980.315080 
## stopped after 100 iterations
## # weights:  187
## initial  value 4128.635738 
## iter  10 value 1314.121281
## iter  20 value 1132.426131
## iter  30 value 1022.863954
## iter  40 value 942.842878
## iter  50 value 881.983774
## iter  60 value 814.373530
## iter  70 value 769.322292
## iter  80 value 731.176638
## iter  90 value 710.970086
## iter 100 value 697.804301
## final  value 697.804301 
## stopped after 100 iterations
## # weights:  311
## initial  value 1908.586711 
## iter  10 value 1167.972429
## iter  20 value 827.505461
## iter  30 value 724.030760
## iter  40 value 659.985786
## iter  50 value 612.673520
## iter  60 value 569.903504
## iter  70 value 543.196900
## iter  80 value 504.514334
## iter  90 value 477.955992
## iter 100 value 450.385340
## final  value 450.385340 
## stopped after 100 iterations
## # weights:  63
## initial  value 3323.676314 
## iter  10 value 1295.064976
## iter  20 value 1185.243952
## iter  30 value 1132.483679
## iter  40 value 1077.661244
## iter  50 value 1057.764155
## iter  60 value 1045.760768
## iter  70 value 1039.283336
## iter  80 value 1038.257659
## iter  90 value 1036.673802
## iter 100 value 1036.069514
## final  value 1036.069514 
## stopped after 100 iterations
## # weights:  187
## initial  value 1671.332500 
## iter  10 value 1055.784090
## iter  20 value 933.083278
## iter  30 value 867.137996
## iter  40 value 831.587249
## iter  50 value 815.521254
## iter  60 value 801.387440
## iter  70 value 755.482999
## iter  80 value 737.238420
## iter  90 value 732.971958
## iter 100 value 732.533407
## final  value 732.533407 
## stopped after 100 iterations
## # weights:  311
## initial  value 1590.994023 
## iter  10 value 913.755074
## iter  20 value 658.883306
## iter  30 value 525.476505
## iter  40 value 453.525163
## iter  50 value 430.257741
## iter  60 value 417.850052
## iter  70 value 407.388871
## iter  80 value 399.358777
## iter  90 value 387.364036
## iter 100 value 381.930052
## final  value 381.930052 
## stopped after 100 iterations
## # weights:  63
## initial  value 2235.677053 
## iter  10 value 1092.797462
## iter  20 value 1001.643632
## iter  30 value 929.404537
## iter  40 value 905.326497
## iter  50 value 875.336276
## iter  60 value 872.867691
## iter  70 value 872.751620
## final  value 872.751370 
## converged
## # weights:  187
## initial  value 1656.588639 
## iter  10 value 1027.922877
## iter  20 value 881.740630
## iter  30 value 797.242196
## iter  40 value 760.406095
## iter  50 value 732.186988
## iter  60 value 717.946448
## iter  70 value 715.723192
## iter  80 value 715.238591
## final  value 715.066321 
## converged
## # weights:  311
## initial  value 2729.040179 
## iter  10 value 1247.704674
## iter  20 value 959.767012
## iter  30 value 802.304571
## iter  40 value 648.629220
## iter  50 value 532.214752
## iter  60 value 453.621541
## iter  70 value 409.974901
## iter  80 value 389.215420
## iter  90 value 347.262743
## iter 100 value 335.857286
## final  value 335.857286 
## stopped after 100 iterations
## # weights:  63
## initial  value 1932.656167 
## iter  10 value 1199.626478
## iter  20 value 1047.635152
## iter  30 value 1017.155131
## iter  40 value 964.980563
## iter  50 value 953.117920
## iter  60 value 950.973155
## iter  70 value 950.234566
## iter  80 value 947.732235
## iter  90 value 947.527064
## final  value 947.526115 
## converged
## # weights:  187
## initial  value 1898.536455 
## iter  10 value 1012.701985
## iter  20 value 901.370113
## iter  30 value 864.135842
## iter  40 value 820.120162
## iter  50 value 768.377716
## iter  60 value 733.774669
## iter  70 value 714.579327
## iter  80 value 698.534610
## iter  90 value 692.948870
## iter 100 value 687.534365
## final  value 687.534365 
## stopped after 100 iterations
## # weights:  311
## initial  value 2154.205521 
## iter  10 value 1175.888799
## iter  20 value 1001.133381
## iter  30 value 841.884548
## iter  40 value 748.877055
## iter  50 value 665.868535
## iter  60 value 601.654439
## iter  70 value 563.554067
## iter  80 value 524.876525
## iter  90 value 491.756393
## iter 100 value 471.805510
## final  value 471.805510 
## stopped after 100 iterations
## # weights:  63
## initial  value 1834.334378 
## iter  10 value 1034.338591
## iter  20 value 945.220359
## iter  30 value 931.298517
## iter  40 value 915.952443
## iter  50 value 908.635134
## iter  60 value 908.334631
## iter  70 value 908.266724
## iter  80 value 908.195208
## iter  90 value 907.392051
## iter 100 value 907.169766
## final  value 907.169766 
## stopped after 100 iterations
## # weights:  187
## initial  value 2889.319785 
## iter  10 value 1225.871411
## iter  20 value 1019.027185
## iter  30 value 853.513384
## iter  40 value 736.749286
## iter  50 value 689.719702
## iter  60 value 669.016977
## iter  70 value 642.603445
## iter  80 value 625.697800
## iter  90 value 617.831603
## iter 100 value 614.980787
## final  value 614.980787 
## stopped after 100 iterations
## # weights:  311
## initial  value 2727.065309 
## iter  10 value 1146.966703
## iter  20 value 862.271607
## iter  30 value 745.859366
## iter  40 value 683.141052
## iter  50 value 648.654594
## iter  60 value 620.845441
## iter  70 value 591.430901
## iter  80 value 574.999142
## iter  90 value 560.570812
## iter 100 value 547.528369
## final  value 547.528369 
## stopped after 100 iterations
## # weights:  63
## initial  value 1636.079191 
## iter  10 value 1059.633419
## iter  20 value 939.961877
## iter  30 value 873.601374
## iter  40 value 849.763947
## iter  50 value 839.027635
## iter  60 value 796.069250
## iter  70 value 780.992457
## iter  80 value 780.957485
## final  value 780.957450 
## converged
## # weights:  187
## initial  value 2828.449869 
## iter  10 value 1175.583276
## iter  20 value 1010.725358
## iter  30 value 836.696881
## iter  40 value 769.564170
## iter  50 value 715.374482
## iter  60 value 698.080848
## iter  70 value 679.022542
## iter  80 value 669.745452
## iter  90 value 665.501383
## iter 100 value 664.244511
## final  value 664.244511 
## stopped after 100 iterations
## # weights:  311
## initial  value 1437.676164 
## iter  10 value 866.873486
## iter  20 value 612.891095
## iter  30 value 495.480303
## iter  40 value 421.466980
## iter  50 value 401.660004
## iter  60 value 384.523250
## iter  70 value 367.695909
## iter  80 value 355.603137
## iter  90 value 350.179899
## iter 100 value 349.415728
## final  value 349.415728 
## stopped after 100 iterations
## # weights:  63
## initial  value 2434.491248 
## iter  10 value 1174.115539
## iter  20 value 1078.336923
## iter  30 value 1015.256776
## iter  40 value 979.427033
## iter  50 value 941.320083
## iter  60 value 923.960090
## iter  70 value 923.124631
## iter  80 value 922.831914
## iter  90 value 922.691540
## iter 100 value 922.630892
## final  value 922.630892 
## stopped after 100 iterations
## # weights:  187
## initial  value 1759.419994 
## iter  10 value 984.864575
## iter  20 value 854.509407
## iter  30 value 799.253091
## iter  40 value 758.339506
## iter  50 value 726.316378
## iter  60 value 693.092735
## iter  70 value 667.216230
## iter  80 value 651.953040
## iter  90 value 639.497039
## iter 100 value 625.190698
## final  value 625.190698 
## stopped after 100 iterations
## # weights:  311
## initial  value 2663.363302 
## iter  10 value 1228.471396
## iter  20 value 1059.763670
## iter  30 value 929.375754
## iter  40 value 833.084918
## iter  50 value 774.747289
## iter  60 value 737.133560
## iter  70 value 695.011370
## iter  80 value 635.268639
## iter  90 value 590.659583
## iter 100 value 546.439071
## final  value 546.439071 
## stopped after 100 iterations
## # weights:  63
## initial  value 1896.257715 
## iter  10 value 1219.525400
## iter  20 value 1036.698945
## iter  30 value 955.255729
## iter  40 value 892.708595
## iter  50 value 875.155701
## iter  60 value 867.823326
## iter  70 value 865.188404
## iter  80 value 862.594006
## iter  90 value 856.330884
## iter 100 value 855.775388
## final  value 855.775388 
## stopped after 100 iterations
## # weights:  187
## initial  value 2148.385846 
## iter  10 value 1118.449892
## iter  20 value 972.487773
## iter  30 value 764.484651
## iter  40 value 685.026843
## iter  50 value 651.773281
## iter  60 value 625.669242
## iter  70 value 609.059105
## iter  80 value 603.195954
## iter  90 value 597.155351
## iter 100 value 590.256523
## final  value 590.256523 
## stopped after 100 iterations
## # weights:  311
## initial  value 2346.937072 
## iter  10 value 1119.942954
## iter  20 value 916.982434
## iter  30 value 774.543742
## iter  40 value 622.371553
## iter  50 value 522.150638
## iter  60 value 487.241136
## iter  70 value 469.174504
## iter  80 value 453.147538
## iter  90 value 440.318522
## iter 100 value 426.953238
## final  value 426.953238 
## stopped after 100 iterations
## # weights:  63
## initial  value 2523.717385 
## iter  10 value 1206.727871
## iter  20 value 1101.116474
## iter  30 value 1017.060241
## iter  40 value 987.263789
## iter  50 value 968.550110
## iter  60 value 952.560137
## iter  70 value 948.675615
## iter  80 value 948.655594
## iter  90 value 948.653335
## final  value 948.653172 
## converged
## # weights:  187
## initial  value 3756.513855 
## iter  10 value 1291.149472
## iter  20 value 1131.207972
## iter  30 value 961.887161
## iter  40 value 858.828529
## iter  50 value 830.391645
## iter  60 value 772.209587
## iter  70 value 741.599452
## iter  80 value 717.349865
## iter  90 value 701.003889
## iter 100 value 690.540051
## final  value 690.540051 
## stopped after 100 iterations
## # weights:  311
## initial  value 3412.914485 
## iter  10 value 1152.835895
## iter  20 value 945.789580
## iter  30 value 765.433309
## iter  40 value 613.057454
## iter  50 value 535.718780
## iter  60 value 484.040918
## iter  70 value 428.592270
## iter  80 value 394.366338
## iter  90 value 380.791388
## iter 100 value 368.472857
## final  value 368.472857 
## stopped after 100 iterations
## # weights:  63
## initial  value 2656.442816 
## iter  10 value 1172.069715
## iter  20 value 1041.046071
## iter  30 value 980.836611
## iter  40 value 972.203960
## iter  50 value 971.217736
## iter  60 value 970.897934
## iter  70 value 970.865804
## final  value 970.865320 
## converged
## # weights:  187
## initial  value 2242.128424 
## iter  10 value 1181.634409
## iter  20 value 1014.647817
## iter  30 value 917.474521
## iter  40 value 829.882306
## iter  50 value 785.980297
## iter  60 value 757.073419
## iter  70 value 730.032513
## iter  80 value 684.790901
## iter  90 value 651.207459
## iter 100 value 627.989089
## final  value 627.989089 
## stopped after 100 iterations
## # weights:  311
## initial  value 3014.043943 
## iter  10 value 1620.883858
## iter  20 value 1332.405510
## iter  30 value 1140.381160
## iter  40 value 958.239809
## iter  50 value 816.444947
## iter  60 value 733.566993
## iter  70 value 666.832976
## iter  80 value 624.933074
## iter  90 value 596.962187
## iter 100 value 549.333724
## final  value 549.333724 
## stopped after 100 iterations
## # weights:  63
## initial  value 2175.675085 
## iter  10 value 1036.212624
## iter  20 value 933.844914
## iter  30 value 894.031548
## iter  40 value 875.500549
## iter  50 value 834.470410
## iter  60 value 793.946806
## iter  70 value 790.270942
## iter  80 value 789.606522
## iter  90 value 789.383059
## iter 100 value 789.295777
## final  value 789.295777 
## stopped after 100 iterations
## # weights:  187
## initial  value 3304.976485 
## iter  10 value 1272.658657
## iter  20 value 1044.679129
## iter  30 value 921.536936
## iter  40 value 829.579116
## iter  50 value 746.256575
## iter  60 value 718.784581
## iter  70 value 701.257374
## iter  80 value 692.074076
## iter  90 value 688.266152
## iter 100 value 684.123252
## final  value 684.123252 
## stopped after 100 iterations
## # weights:  311
## initial  value 1833.005405 
## iter  10 value 1097.681912
## iter  20 value 831.960786
## iter  30 value 652.563214
## iter  40 value 557.357214
## iter  50 value 518.700403
## iter  60 value 494.523777
## iter  70 value 478.183661
## iter  80 value 469.933343
## iter  90 value 459.754883
## iter 100 value 457.954267
## final  value 457.954267 
## stopped after 100 iterations
## # weights:  63
## initial  value 1836.349129 
## iter  10 value 1154.434444
## iter  20 value 1028.755202
## iter  30 value 976.979204
## iter  40 value 941.196347
## iter  50 value 913.512255
## iter  60 value 875.603348
## iter  70 value 861.672355
## iter  80 value 861.613776
## final  value 861.613627 
## converged
## # weights:  187
## initial  value 1784.781465 
## iter  10 value 985.216478
## iter  20 value 794.393058
## iter  30 value 676.082834
## iter  40 value 607.796306
## iter  50 value 553.012326
## iter  60 value 532.504982
## iter  70 value 515.177028
## iter  80 value 493.002325
## iter  90 value 482.696151
## iter 100 value 481.849644
## final  value 481.849644 
## stopped after 100 iterations
## # weights:  311
## initial  value 1840.899172 
## iter  10 value 1128.207603
## iter  20 value 767.273108
## iter  30 value 610.280882
## iter  40 value 524.357529
## iter  50 value 471.919647
## iter  60 value 432.971016
## iter  70 value 417.469390
## iter  80 value 405.994730
## iter  90 value 401.988884
## iter 100 value 400.489378
## final  value 400.489378 
## stopped after 100 iterations
## # weights:  63
## initial  value 1913.010460 
## iter  10 value 1194.234209
## iter  20 value 1031.290625
## iter  30 value 991.724284
## iter  40 value 987.178453
## iter  50 value 982.051466
## iter  60 value 977.608334
## iter  70 value 973.649939
## iter  80 value 973.328546
## iter  90 value 973.310853
## final  value 973.310756 
## converged
## # weights:  187
## initial  value 1915.629939 
## iter  10 value 1178.918639
## iter  20 value 1026.474476
## iter  30 value 921.679130
## iter  40 value 867.721120
## iter  50 value 807.240180
## iter  60 value 779.024941
## iter  70 value 762.457929
## iter  80 value 745.558649
## iter  90 value 739.561736
## iter 100 value 734.276191
## final  value 734.276191 
## stopped after 100 iterations
## # weights:  311
## initial  value 1573.167519 
## iter  10 value 981.560965
## iter  20 value 755.022502
## iter  30 value 627.752738
## iter  40 value 570.603495
## iter  50 value 510.126960
## iter  60 value 456.443837
## iter  70 value 427.893725
## iter  80 value 414.090827
## iter  90 value 402.426615
## iter 100 value 396.632617
## final  value 396.632617 
## stopped after 100 iterations
## # weights:  63
## initial  value 2121.229454 
## iter  10 value 1247.526780
## iter  20 value 1087.492144
## iter  30 value 1011.694183
## iter  40 value 992.956406
## iter  50 value 980.290324
## iter  60 value 966.777869
## iter  70 value 950.787649
## iter  80 value 946.752987
## iter  90 value 945.894783
## iter 100 value 944.634036
## final  value 944.634036 
## stopped after 100 iterations
## # weights:  187
## initial  value 1928.624826 
## iter  10 value 1119.765479
## iter  20 value 931.060512
## iter  30 value 826.309629
## iter  40 value 774.139525
## iter  50 value 738.287897
## iter  60 value 712.242637
## iter  70 value 695.622411
## iter  80 value 689.777214
## iter  90 value 685.207329
## iter 100 value 682.991948
## final  value 682.991948 
## stopped after 100 iterations
## # weights:  311
## initial  value 2240.880828 
## iter  10 value 1125.175252
## iter  20 value 960.530471
## iter  30 value 781.080861
## iter  40 value 665.112610
## iter  50 value 541.940907
## iter  60 value 463.292867
## iter  70 value 426.705185
## iter  80 value 399.237653
## iter  90 value 386.274195
## iter 100 value 380.633878
## final  value 380.633878 
## stopped after 100 iterations
## # weights:  63
## initial  value 2932.581728 
## iter  10 value 1365.469830
## iter  20 value 1276.239477
## iter  30 value 1122.191691
## iter  40 value 1059.318556
## iter  50 value 1032.974523
## iter  60 value 1017.841039
## iter  70 value 1010.400342
## iter  80 value 1002.824739
## iter  90 value 994.425793
## iter 100 value 987.380014
## final  value 987.380014 
## stopped after 100 iterations
## # weights:  187
## initial  value 1486.153561 
## iter  10 value 995.242068
## iter  20 value 851.131950
## iter  30 value 758.292406
## iter  40 value 694.131868
## iter  50 value 680.501967
## iter  60 value 662.620555
## iter  70 value 636.670970
## iter  80 value 621.041551
## iter  90 value 617.919830
## iter 100 value 616.595172
## final  value 616.595172 
## stopped after 100 iterations
## # weights:  311
## initial  value 3711.752857 
## iter  10 value 1359.761393
## iter  20 value 1298.561452
## iter  30 value 1207.592233
## iter  40 value 1157.839743
## iter  50 value 1064.698485
## iter  60 value 1019.577487
## iter  70 value 1008.229113
## iter  80 value 996.363339
## iter  90 value 985.139893
## iter 100 value 966.910732
## final  value 966.910732 
## stopped after 100 iterations
## # weights:  63
## initial  value 2234.160464 
## iter  10 value 1142.828059
## iter  20 value 1022.282510
## iter  30 value 999.286686
## iter  40 value 993.189216
## iter  50 value 991.648022
## iter  60 value 991.619938
## final  value 991.618852 
## converged
## # weights:  187
## initial  value 1778.537453 
## iter  10 value 1088.834546
## iter  20 value 931.154946
## iter  30 value 884.762019
## iter  40 value 835.220025
## iter  50 value 784.790909
## iter  60 value 763.106901
## iter  70 value 754.886057
## iter  80 value 740.309569
## iter  90 value 731.735121
## iter 100 value 718.417129
## final  value 718.417129 
## stopped after 100 iterations
## # weights:  311
## initial  value 2220.773437 
## iter  10 value 1176.416613
## iter  20 value 995.572551
## iter  30 value 834.537517
## iter  40 value 715.215599
## iter  50 value 647.689540
## iter  60 value 598.019702
## iter  70 value 541.391079
## iter  80 value 496.480282
## iter  90 value 469.316455
## iter 100 value 453.385248
## final  value 453.385248 
## stopped after 100 iterations
## # weights:  63
## initial  value 2743.540073 
## iter  10 value 1170.900252
## iter  20 value 1046.682361
## iter  30 value 960.949082
## iter  40 value 938.576237
## iter  50 value 923.543119
## iter  60 value 904.078122
## iter  70 value 902.391658
## iter  80 value 901.812666
## iter  90 value 901.529625
## iter 100 value 900.928860
## final  value 900.928860 
## stopped after 100 iterations
## # weights:  187
## initial  value 1903.552738 
## iter  10 value 1011.713204
## iter  20 value 817.838513
## iter  30 value 706.975807
## iter  40 value 639.131463
## iter  50 value 587.816985
## iter  60 value 570.240525
## iter  70 value 552.199580
## iter  80 value 539.795568
## iter  90 value 533.973363
## iter 100 value 526.191920
## final  value 526.191920 
## stopped after 100 iterations
## # weights:  311
## initial  value 4325.350441 
## iter  10 value 1352.844203
## iter  20 value 1086.455908
## iter  30 value 946.990879
## iter  40 value 788.480792
## iter  50 value 666.424163
## iter  60 value 610.393300
## iter  70 value 581.372288
## iter  80 value 548.255284
## iter  90 value 528.137284
## iter 100 value 524.046820
## final  value 524.046820 
## stopped after 100 iterations
## # weights:  63
## initial  value 1754.277525 
## iter  10 value 1132.314187
## iter  20 value 1012.101999
## iter  30 value 955.271066
## iter  40 value 940.172116
## iter  50 value 917.097627
## iter  60 value 881.435856
## iter  70 value 880.369919
## iter  80 value 880.355772
## final  value 880.355756 
## converged
## # weights:  187
## initial  value 2485.103074 
## iter  10 value 1307.643639
## iter  20 value 1246.566601
## iter  30 value 1177.468611
## iter  40 value 1120.300478
## iter  50 value 1076.138720
## iter  60 value 1048.058532
## iter  70 value 1006.894584
## iter  80 value 993.874278
## iter  90 value 986.302816
## iter 100 value 983.550744
## final  value 983.550744 
## stopped after 100 iterations
## # weights:  311
## initial  value 2709.304588 
## iter  10 value 1277.042583
## iter  20 value 964.529897
## iter  30 value 713.366885
## iter  40 value 584.047875
## iter  50 value 534.003898
## iter  60 value 489.325008
## iter  70 value 468.958187
## iter  80 value 459.472113
## iter  90 value 442.092133
## iter 100 value 435.993793
## final  value 435.993793 
## stopped after 100 iterations
## # weights:  63
## initial  value 1766.330644 
## iter  10 value 1119.860265
## iter  20 value 1017.516347
## iter  30 value 980.435255
## iter  40 value 973.891105
## iter  50 value 967.856928
## iter  60 value 955.637791
## iter  70 value 954.900989
## iter  80 value 954.828000
## final  value 954.827125 
## converged
## # weights:  187
## initial  value 1408.273143 
## iter  10 value 1007.247522
## iter  20 value 912.558922
## iter  30 value 869.981112
## iter  40 value 834.473951
## iter  50 value 801.334095
## iter  60 value 772.771769
## iter  70 value 736.623987
## iter  80 value 719.194244
## iter  90 value 716.058823
## iter 100 value 715.381053
## final  value 715.381053 
## stopped after 100 iterations
## # weights:  311
## initial  value 2425.706669 
## iter  10 value 1266.121697
## iter  20 value 1084.028530
## iter  30 value 927.508307
## iter  40 value 810.578519
## iter  50 value 670.801979
## iter  60 value 596.090362
## iter  70 value 544.655016
## iter  80 value 501.183864
## iter  90 value 466.743495
## iter 100 value 449.662770
## final  value 449.662770 
## stopped after 100 iterations
## # weights:  63
## initial  value 1968.358214 
## iter  10 value 1308.414048
## iter  20 value 1217.824249
## iter  30 value 1152.695534
## iter  40 value 1105.149559
## iter  50 value 1085.083269
## iter  60 value 1073.529658
## iter  70 value 1070.384854
## iter  80 value 1068.468591
## iter  90 value 1067.966114
## iter 100 value 1067.636936
## final  value 1067.636936 
## stopped after 100 iterations
## # weights:  187
## initial  value 1717.902998 
## iter  10 value 992.532720
## iter  20 value 866.757009
## iter  30 value 790.066250
## iter  40 value 726.293581
## iter  50 value 693.725456
## iter  60 value 677.338276
## iter  70 value 665.697840
## iter  80 value 662.985476
## iter  90 value 661.148481
## iter 100 value 659.096840
## final  value 659.096840 
## stopped after 100 iterations
## # weights:  311
## initial  value 1661.030037 
## iter  10 value 955.222461
## iter  20 value 648.400979
## iter  30 value 532.543656
## iter  40 value 469.543539
## iter  50 value 429.746740
## iter  60 value 405.743738
## iter  70 value 386.956738
## iter  80 value 378.061040
## iter  90 value 364.167322
## iter 100 value 353.279312
## final  value 353.279312 
## stopped after 100 iterations
## # weights:  63
## initial  value 2714.387852 
## iter  10 value 1315.442153
## iter  20 value 1313.493549
## iter  30 value 1296.438895
## iter  40 value 1290.616507
## iter  50 value 1290.534298
## iter  60 value 1290.530151
## final  value 1290.529203 
## converged
## # weights:  187
## initial  value 3849.357408 
## iter  10 value 1228.013988
## iter  20 value 1037.723080
## iter  30 value 930.772614
## iter  40 value 826.486247
## iter  50 value 797.660273
## iter  60 value 756.064151
## iter  70 value 717.375269
## iter  80 value 703.044706
## iter  90 value 683.256663
## iter 100 value 671.478514
## final  value 671.478514 
## stopped after 100 iterations
## # weights:  311
## initial  value 2397.725992 
## iter  10 value 1178.787921
## iter  20 value 1004.360232
## iter  30 value 847.353201
## iter  40 value 657.753451
## iter  50 value 571.662692
## iter  60 value 520.773224
## iter  70 value 481.973645
## iter  80 value 457.345240
## iter  90 value 441.685167
## iter 100 value 433.101441
## final  value 433.101441 
## stopped after 100 iterations
## # weights:  63
## initial  value 1800.098218 
## iter  10 value 1065.276944
## iter  20 value 981.664560
## iter  30 value 949.714117
## iter  40 value 938.411971
## iter  50 value 932.558303
## iter  60 value 931.493065
## iter  70 value 928.590845
## iter  80 value 925.941187
## iter  90 value 925.736654
## final  value 925.735449 
## converged
## # weights:  187
## initial  value 1813.958246 
## iter  10 value 1027.237644
## iter  20 value 869.619130
## iter  30 value 805.749177
## iter  40 value 778.293803
## iter  50 value 759.170699
## iter  60 value 744.578603
## iter  70 value 737.330301
## iter  80 value 724.790957
## iter  90 value 700.118487
## iter 100 value 675.910625
## final  value 675.910625 
## stopped after 100 iterations
## # weights:  311
## initial  value 2747.318900 
## iter  10 value 1196.057451
## iter  20 value 1007.432622
## iter  30 value 882.656883
## iter  40 value 793.096745
## iter  50 value 718.026760
## iter  60 value 647.901926
## iter  70 value 608.714483
## iter  80 value 588.937619
## iter  90 value 576.958416
## iter 100 value 565.346715
## final  value 565.346715 
## stopped after 100 iterations
## # weights:  63
## initial  value 2136.794835 
## iter  10 value 1146.398047
## iter  20 value 1033.848232
## iter  30 value 974.858672
## iter  40 value 949.274105
## iter  50 value 931.584146
## iter  60 value 909.233858
## iter  70 value 907.695837
## iter  80 value 907.484860
## iter  90 value 907.317972
## iter 100 value 907.204820
## final  value 907.204820 
## stopped after 100 iterations
## # weights:  187
## initial  value 2854.546617 
## iter  10 value 1077.877586
## iter  20 value 911.192394
## iter  30 value 799.390594
## iter  40 value 721.933101
## iter  50 value 663.958628
## iter  60 value 640.235111
## iter  70 value 608.727570
## iter  80 value 598.471414
## iter  90 value 584.579601
## iter 100 value 576.468977
## final  value 576.468977 
## stopped after 100 iterations
## # weights:  311
## initial  value 3427.415484 
## iter  10 value 1307.407646
## iter  20 value 1243.678442
## iter  30 value 1193.740721
## iter  40 value 1164.234227
## iter  50 value 1127.911428
## iter  60 value 1083.470810
## iter  70 value 1054.059140
## iter  80 value 1021.664811
## iter  90 value 1015.310274
## iter 100 value 1007.905692
## final  value 1007.905692 
## stopped after 100 iterations
## # weights:  63
## initial  value 3383.376796 
## iter  10 value 1345.276729
## iter  20 value 1300.168880
## iter  30 value 1194.642741
## iter  40 value 1134.449723
## iter  50 value 1107.503529
## iter  60 value 1077.695311
## iter  70 value 1071.355559
## iter  80 value 1068.267254
## iter  90 value 1066.446914
## iter 100 value 1065.615205
## final  value 1065.615205 
## stopped after 100 iterations
## # weights:  187
## initial  value 2029.661197 
## iter  10 value 1081.762901
## iter  20 value 927.632827
## iter  30 value 805.516547
## iter  40 value 712.511264
## iter  50 value 671.139174
## iter  60 value 647.611142
## iter  70 value 629.082089
## iter  80 value 616.890728
## iter  90 value 606.795822
## iter 100 value 598.907570
## final  value 598.907570 
## stopped after 100 iterations
## # weights:  311
## initial  value 2422.250373 
## iter  10 value 1094.289596
## iter  20 value 920.936921
## iter  30 value 760.339151
## iter  40 value 650.246251
## iter  50 value 559.113333
## iter  60 value 518.542820
## iter  70 value 488.564706
## iter  80 value 462.834197
## iter  90 value 445.102080
## iter 100 value 436.804607
## final  value 436.804607 
## stopped after 100 iterations
## # weights:  63
## initial  value 1577.572285 
## iter  10 value 1198.785539
## iter  20 value 1132.400553
## iter  30 value 1116.847205
## iter  40 value 1101.763925
## iter  50 value 1095.588742
## iter  60 value 1039.350657
## iter  70 value 977.091467
## iter  80 value 956.538441
## iter  90 value 955.012365
## iter 100 value 954.945619
## final  value 954.945619 
## stopped after 100 iterations
## # weights:  187
## initial  value 1674.881852 
## iter  10 value 1031.573809
## iter  20 value 872.495157
## iter  30 value 820.422821
## iter  40 value 788.533444
## iter  50 value 758.711444
## iter  60 value 721.025965
## iter  70 value 689.156675
## iter  80 value 666.791812
## iter  90 value 639.580818
## iter 100 value 620.646015
## final  value 620.646015 
## stopped after 100 iterations
## # weights:  311
## initial  value 2555.903847 
## iter  10 value 1308.904249
## iter  20 value 1048.244252
## iter  30 value 891.505784
## iter  40 value 815.036183
## iter  50 value 777.475934
## iter  60 value 677.668684
## iter  70 value 616.893522
## iter  80 value 573.039006
## iter  90 value 540.655331
## iter 100 value 512.695080
## final  value 512.695080 
## stopped after 100 iterations
## # weights:  63
## initial  value 3283.025944 
## iter  10 value 1328.592392
## iter  20 value 1164.156251
## iter  30 value 1115.090754
## iter  40 value 1078.954337
## iter  50 value 1056.582597
## iter  60 value 1026.486122
## iter  70 value 1007.157239
## iter  80 value 1001.340176
## iter  90 value 984.197580
## iter 100 value 974.129963
## final  value 974.129963 
## stopped after 100 iterations
## # weights:  187
## initial  value 2263.427622 
## iter  10 value 1355.699634
## iter  20 value 1182.243563
## iter  30 value 945.647850
## iter  40 value 867.622227
## iter  50 value 829.670591
## iter  60 value 806.488205
## iter  70 value 797.012232
## iter  80 value 788.396233
## iter  90 value 784.905656
## iter 100 value 780.681420
## final  value 780.681420 
## stopped after 100 iterations
## # weights:  311
## initial  value 1589.441434 
## iter  10 value 1031.986803
## iter  20 value 744.108023
## iter  30 value 644.053670
## iter  40 value 563.149442
## iter  50 value 483.715371
## iter  60 value 434.578638
## iter  70 value 408.367711
## iter  80 value 375.957515
## iter  90 value 362.905541
## iter 100 value 351.926566
## final  value 351.926566 
## stopped after 100 iterations
## # weights:  63
## initial  value 2529.539263 
## iter  10 value 1134.251782
## iter  20 value 1037.074577
## iter  30 value 1014.120172
## iter  40 value 991.297491
## iter  50 value 961.644500
## iter  60 value 937.274705
## iter  70 value 937.107436
## final  value 937.106141 
## converged
## # weights:  187
## initial  value 2559.787401 
## iter  10 value 1297.140286
## iter  20 value 1143.732171
## iter  30 value 980.146176
## iter  40 value 857.954247
## iter  50 value 793.130902
## iter  60 value 755.597075
## iter  70 value 727.940746
## iter  80 value 690.901960
## iter  90 value 671.276626
## iter 100 value 651.985236
## final  value 651.985236 
## stopped after 100 iterations
## # weights:  311
## initial  value 3117.697165 
## iter  10 value 1365.023561
## iter  20 value 1272.683998
## iter  30 value 1198.810553
## iter  40 value 1160.070718
## iter  50 value 1145.642212
## iter  60 value 1138.558994
## iter  70 value 1131.556107
## iter  80 value 1127.904974
## iter  90 value 1121.969313
## iter 100 value 1111.494485
## final  value 1111.494485 
## stopped after 100 iterations
## # weights:  63
## initial  value 1556.722768 
## iter  10 value 1138.330553
## iter  20 value 1046.589165
## iter  30 value 1026.277868
## iter  40 value 1017.625726
## iter  50 value 1015.990954
## iter  60 value 1015.426726
## iter  70 value 1015.418645
## final  value 1015.418610 
## converged
## # weights:  187
## initial  value 3233.699428 
## iter  10 value 1349.985727
## iter  20 value 1219.304023
## iter  30 value 1064.733674
## iter  40 value 972.544342
## iter  50 value 901.201412
## iter  60 value 853.422880
## iter  70 value 819.724130
## iter  80 value 794.272660
## iter  90 value 770.723301
## iter 100 value 755.292334
## final  value 755.292334 
## stopped after 100 iterations
## # weights:  311
## initial  value 3868.682905 
## iter  10 value 1315.788557
## iter  20 value 1186.902221
## iter  30 value 1090.066560
## iter  40 value 1020.174774
## iter  50 value 969.281293
## iter  60 value 943.637771
## iter  70 value 901.708424
## iter  80 value 841.484181
## iter  90 value 751.534065
## iter 100 value 686.286113
## final  value 686.286113 
## stopped after 100 iterations
## # weights:  63
## initial  value 2608.284859 
## iter  10 value 1234.227798
## iter  20 value 1078.450026
## iter  30 value 1002.915905
## iter  40 value 973.679779
## iter  50 value 951.961269
## iter  60 value 925.680719
## iter  70 value 922.410038
## iter  80 value 921.374317
## iter  90 value 920.684054
## iter 100 value 920.565155
## final  value 920.565155 
## stopped after 100 iterations
## # weights:  187
## initial  value 2518.331288 
## iter  10 value 1116.551728
## iter  20 value 881.101784
## iter  30 value 740.499797
## iter  40 value 691.328076
## iter  50 value 643.626376
## iter  60 value 617.611942
## iter  70 value 602.284671
## iter  80 value 591.412444
## iter  90 value 589.307323
## iter 100 value 588.515404
## final  value 588.515404 
## stopped after 100 iterations
## # weights:  311
## initial  value 3052.162069 
## iter  10 value 1361.786388
## iter  20 value 1176.910698
## iter  30 value 1016.512352
## iter  40 value 817.960418
## iter  50 value 659.144344
## iter  60 value 592.442464
## iter  70 value 550.896164
## iter  80 value 529.710532
## iter  90 value 498.877650
## iter 100 value 474.004920
## final  value 474.004920 
## stopped after 100 iterations
## # weights:  63
## initial  value 1899.636084 
## iter  10 value 1036.324031
## iter  20 value 949.627243
## iter  30 value 906.060246
## iter  40 value 885.023549
## iter  50 value 868.077492
## iter  60 value 854.896772
## iter  70 value 853.660755
## iter  80 value 853.503962
## iter  90 value 853.462484
## iter 100 value 853.422882
## final  value 853.422882 
## stopped after 100 iterations
## # weights:  187
## initial  value 1739.222098 
## iter  10 value 984.109927
## iter  20 value 798.463186
## iter  30 value 732.617305
## iter  40 value 674.308693
## iter  50 value 633.708848
## iter  60 value 611.188451
## iter  70 value 595.276752
## iter  80 value 576.329586
## iter  90 value 575.142496
## iter 100 value 574.807640
## final  value 574.807640 
## stopped after 100 iterations
## # weights:  311
## initial  value 1525.533752 
## iter  10 value 876.367123
## iter  20 value 644.598447
## iter  30 value 505.781980
## iter  40 value 452.160907
## iter  50 value 407.012779
## iter  60 value 385.467274
## iter  70 value 379.171017
## iter  80 value 375.314192
## iter  90 value 371.296787
## iter 100 value 363.857457
## final  value 363.857457 
## stopped after 100 iterations
## # weights:  63
## initial  value 2472.202434 
## iter  10 value 1136.726793
## iter  20 value 1058.886541
## iter  30 value 1037.300970
## iter  40 value 1015.540107
## iter  50 value 995.574267
## iter  60 value 970.351794
## iter  70 value 945.810520
## iter  80 value 935.634410
## iter  90 value 929.342508
## iter 100 value 928.151748
## final  value 928.151748 
## stopped after 100 iterations
## # weights:  187
## initial  value 2862.651924 
## iter  10 value 1299.174235
## iter  20 value 1117.162341
## iter  30 value 1019.689721
## iter  40 value 949.458240
## iter  50 value 930.590853
## iter  60 value 892.321187
## iter  70 value 794.502517
## iter  80 value 716.656799
## iter  90 value 685.831947
## iter 100 value 662.816633
## final  value 662.816633 
## stopped after 100 iterations
## # weights:  311
## initial  value 2928.129963 
## iter  10 value 1682.443606
## iter  20 value 1232.417212
## iter  30 value 962.607831
## iter  40 value 797.560769
## iter  50 value 696.820756
## iter  60 value 621.152133
## iter  70 value 583.928074
## iter  80 value 524.384659
## iter  90 value 495.957486
## iter 100 value 450.427832
## final  value 450.427832 
## stopped after 100 iterations
## # weights:  63
## initial  value 1770.286310 
## iter  10 value 1101.162421
## iter  20 value 960.843482
## iter  30 value 928.983828
## iter  40 value 906.936558
## iter  50 value 890.654883
## iter  60 value 886.500436
## iter  70 value 883.108037
## iter  80 value 882.463408
## iter  90 value 882.031413
## iter 100 value 881.605834
## final  value 881.605834 
## stopped after 100 iterations
## # weights:  187
## initial  value 2067.371823 
## iter  10 value 1291.545455
## iter  20 value 1157.765135
## iter  30 value 1064.583904
## iter  40 value 1007.406845
## iter  50 value 981.923592
## iter  60 value 958.621443
## iter  70 value 924.190660
## iter  80 value 899.884786
## iter  90 value 880.927630
## iter 100 value 857.441396
## final  value 857.441396 
## stopped after 100 iterations
## # weights:  311
## initial  value 3347.105160 
## iter  10 value 1192.830909
## iter  20 value 956.351540
## iter  30 value 821.708011
## iter  40 value 707.395586
## iter  50 value 643.206519
## iter  60 value 567.318699
## iter  70 value 533.068684
## iter  80 value 515.795686
## iter  90 value 496.066276
## iter 100 value 486.038465
## final  value 486.038465 
## stopped after 100 iterations
## # weights:  63
## initial  value 3102.311649 
## final  value 1436.011228 
## converged
## # weights:  187
## initial  value 1497.004636 
## iter  10 value 1092.773648
## iter  20 value 969.596087
## iter  30 value 865.666841
## iter  40 value 752.093616
## iter  50 value 709.806576
## iter  60 value 681.450673
## iter  70 value 660.883270
## iter  80 value 645.911386
## iter  90 value 645.201044
## iter 100 value 645.190086
## final  value 645.190086 
## stopped after 100 iterations
## # weights:  311
## initial  value 2335.197361 
## iter  10 value 1030.757951
## iter  20 value 749.624722
## iter  30 value 533.708191
## iter  40 value 441.667561
## iter  50 value 402.899809
## iter  60 value 374.289962
## iter  70 value 357.347297
## iter  80 value 338.868189
## iter  90 value 325.534785
## iter 100 value 315.748515
## final  value 315.748515 
## stopped after 100 iterations
## # weights:  63
## initial  value 1596.606826 
## iter  10 value 1139.764903
## iter  20 value 1070.835783
## iter  30 value 1040.742377
## iter  40 value 1026.765741
## iter  50 value 1025.352351
## iter  60 value 1024.876820
## iter  70 value 1024.865278
## final  value 1024.864798 
## converged
## # weights:  187
## initial  value 2020.771259 
## iter  10 value 1063.683206
## iter  20 value 961.251369
## iter  30 value 886.510415
## iter  40 value 838.646332
## iter  50 value 812.040584
## iter  60 value 789.006645
## iter  70 value 764.384341
## iter  80 value 744.965112
## iter  90 value 735.899952
## iter 100 value 730.304757
## final  value 730.304757 
## stopped after 100 iterations
## # weights:  311
## initial  value 1914.175835 
## iter  10 value 990.901853
## iter  20 value 794.647576
## iter  30 value 675.483532
## iter  40 value 620.263285
## iter  50 value 595.839288
## iter  60 value 575.870148
## iter  70 value 554.026939
## iter  80 value 536.716036
## iter  90 value 523.821286
## iter 100 value 508.041712
## final  value 508.041712 
## stopped after 100 iterations
## # weights:  63
## initial  value 1802.454274 
## iter  10 value 1167.698725
## iter  20 value 1038.274865
## iter  30 value 1009.238110
## iter  40 value 997.040877
## iter  50 value 972.507209
## iter  60 value 959.009336
## iter  70 value 958.013201
## iter  80 value 956.544935
## iter  90 value 956.114850
## iter 100 value 955.641293
## final  value 955.641293 
## stopped after 100 iterations
## # weights:  187
## initial  value 2756.657631 
## iter  10 value 1189.149479
## iter  20 value 1047.663677
## iter  30 value 984.862802
## iter  40 value 939.573283
## iter  50 value 906.368445
## iter  60 value 887.076003
## iter  70 value 879.833740
## iter  80 value 877.827854
## iter  90 value 875.441951
## iter 100 value 873.523817
## final  value 873.523817 
## stopped after 100 iterations
## # weights:  311
## initial  value 2881.355597 
## iter  10 value 1172.586231
## iter  20 value 901.447850
## iter  30 value 696.650050
## iter  40 value 555.161749
## iter  50 value 492.534502
## iter  60 value 456.160350
## iter  70 value 441.404881
## iter  80 value 430.888635
## iter  90 value 424.115781
## iter 100 value 419.097993
## final  value 419.097993 
## stopped after 100 iterations
## # weights:  63
## initial  value 2009.923788 
## iter  10 value 1190.521417
## iter  20 value 1050.849721
## iter  30 value 1003.678960
## iter  40 value 974.485055
## iter  50 value 952.372387
## iter  60 value 940.459147
## iter  70 value 940.195302
## iter  80 value 940.186378
## final  value 940.186349 
## converged
## # weights:  187
## initial  value 3290.921948 
## iter  10 value 1381.055005
## iter  20 value 1379.653393
## final  value 1379.645213 
## converged
## # weights:  311
## initial  value 2299.495305 
## iter  10 value 1001.234496
## iter  20 value 801.874217
## iter  30 value 704.508557
## iter  40 value 646.499906
## iter  50 value 592.222928
## iter  60 value 515.198358
## iter  70 value 481.942953
## iter  80 value 465.234256
## iter  90 value 459.485341
## iter 100 value 455.718763
## final  value 455.718763 
## stopped after 100 iterations
## # weights:  63
## initial  value 2735.216633 
## iter  10 value 1170.140506
## iter  20 value 1028.129860
## iter  30 value 997.984015
## iter  40 value 987.060912
## iter  50 value 986.016167
## iter  60 value 984.473988
## iter  70 value 984.180493
## iter  80 value 984.032451
## final  value 983.959763 
## converged
## # weights:  187
## initial  value 2064.147501 
## iter  10 value 1269.432254
## iter  20 value 1148.619811
## iter  30 value 967.923917
## iter  40 value 891.790549
## iter  50 value 841.202663
## iter  60 value 802.050052
## iter  70 value 775.528895
## iter  80 value 742.249540
## iter  90 value 691.279323
## iter 100 value 664.739475
## final  value 664.739475 
## stopped after 100 iterations
## # weights:  311
## initial  value 1582.216870 
## iter  10 value 934.004520
## iter  20 value 720.251774
## iter  30 value 602.874610
## iter  40 value 538.482333
## iter  50 value 493.858449
## iter  60 value 469.152776
## iter  70 value 453.236116
## iter  80 value 433.788090
## iter  90 value 413.132196
## iter 100 value 398.517667
## final  value 398.517667 
## stopped after 100 iterations
## # weights:  63
## initial  value 2412.083490 
## iter  10 value 1119.487478
## iter  20 value 993.833100
## iter  30 value 944.182920
## iter  40 value 917.346929
## iter  50 value 895.979083
## iter  60 value 859.869168
## iter  70 value 854.636025
## iter  80 value 854.099974
## iter  90 value 852.174554
## iter 100 value 850.845916
## final  value 850.845916 
## stopped after 100 iterations
## # weights:  187
## initial  value 1503.662229 
## iter  10 value 956.067509
## iter  20 value 817.478174
## iter  30 value 740.497213
## iter  40 value 686.367298
## iter  50 value 654.068442
## iter  60 value 639.356774
## iter  70 value 632.765689
## iter  80 value 631.160283
## iter  90 value 629.821264
## iter 100 value 629.394787
## final  value 629.394787 
## stopped after 100 iterations
## # weights:  311
## initial  value 3320.661393 
## iter  10 value 1359.349265
## iter  20 value 1218.992199
## iter  30 value 1035.086449
## iter  40 value 868.214616
## iter  50 value 717.659909
## iter  60 value 609.338560
## iter  70 value 561.325556
## iter  80 value 540.353643
## iter  90 value 517.207027
## iter 100 value 496.047921
## final  value 496.047921 
## stopped after 100 iterations
## # weights:  63
## initial  value 2796.000630 
## iter  10 value 984.382315
## iter  20 value 891.245433
## iter  30 value 857.542019
## iter  40 value 837.207030
## iter  50 value 801.605533
## iter  60 value 765.438340
## iter  70 value 764.160128
## final  value 764.158814 
## converged
## # weights:  187
## initial  value 2305.161801 
## iter  10 value 1312.450796
## iter  20 value 1217.717473
## iter  30 value 1146.275214
## iter  40 value 1108.111849
## iter  50 value 1016.301622
## iter  60 value 985.189918
## iter  70 value 946.205736
## iter  80 value 875.843618
## iter  90 value 799.311685
## iter 100 value 777.118108
## final  value 777.118108 
## stopped after 100 iterations
## # weights:  311
## initial  value 2098.363447 
## iter  10 value 1086.851051
## iter  20 value 863.942784
## iter  30 value 714.153097
## iter  40 value 612.520788
## iter  50 value 558.033821
## iter  60 value 533.114499
## iter  70 value 512.711450
## iter  80 value 505.997632
## iter  90 value 502.294605
## iter 100 value 490.826291
## final  value 490.826291 
## stopped after 100 iterations
## # weights:  63
## initial  value 1996.199272 
## iter  10 value 1254.881410
## iter  20 value 1052.537550
## iter  30 value 949.337412
## iter  40 value 927.668856
## iter  50 value 915.248476
## iter  60 value 911.003555
## iter  70 value 909.386041
## iter  80 value 908.212276
## iter  90 value 906.883337
## iter 100 value 905.581678
## final  value 905.581678 
## stopped after 100 iterations
## # weights:  187
## initial  value 2063.348109 
## iter  10 value 1144.721798
## iter  20 value 997.098189
## iter  30 value 879.663916
## iter  40 value 816.896472
## iter  50 value 736.412444
## iter  60 value 688.972743
## iter  70 value 665.353019
## iter  80 value 654.004438
## iter  90 value 648.023443
## iter 100 value 640.733863
## final  value 640.733863 
## stopped after 100 iterations
## # weights:  311
## initial  value 2732.522345 
## iter  10 value 1191.291674
## iter  20 value 1029.711816
## iter  30 value 943.525469
## iter  40 value 857.698464
## iter  50 value 781.100656
## iter  60 value 721.632170
## iter  70 value 630.532678
## iter  80 value 542.696947
## iter  90 value 471.624510
## iter 100 value 438.597168
## final  value 438.597168 
## stopped after 100 iterations
## # weights:  63
## initial  value 1833.571241 
## iter  10 value 1201.936184
## iter  20 value 1127.349040
## iter  30 value 991.548529
## iter  40 value 921.064724
## iter  50 value 900.915553
## iter  60 value 892.993372
## iter  70 value 890.488275
## iter  80 value 864.449701
## iter  90 value 807.950158
## iter 100 value 800.239117
## final  value 800.239117 
## stopped after 100 iterations
## # weights:  187
## initial  value 1811.151839 
## iter  10 value 1031.272631
## iter  20 value 815.760935
## iter  30 value 711.199394
## iter  40 value 649.197949
## iter  50 value 620.152371
## iter  60 value 608.684563
## iter  70 value 599.845801
## iter  80 value 590.288276
## iter  90 value 587.022055
## iter 100 value 586.339680
## final  value 586.339680 
## stopped after 100 iterations
## # weights:  311
## initial  value 1976.340490 
## iter  10 value 1071.921435
## iter  20 value 856.959019
## iter  30 value 742.703566
## iter  40 value 654.909073
## iter  50 value 614.826029
## iter  60 value 579.128345
## iter  70 value 556.210475
## iter  80 value 544.097152
## iter  90 value 531.068169
## iter 100 value 526.640035
## final  value 526.640035 
## stopped after 100 iterations
## # weights:  63
## initial  value 2391.914003 
## iter  10 value 1057.586783
## iter  20 value 964.924917
## iter  30 value 935.966918
## iter  40 value 914.009133
## iter  50 value 879.089998
## iter  60 value 876.195250
## iter  70 value 876.187280
## final  value 876.187265 
## converged
## # weights:  187
## initial  value 2217.104448 
## iter  10 value 1145.597870
## iter  20 value 976.358292
## iter  30 value 841.693862
## iter  40 value 743.211723
## iter  50 value 705.398309
## iter  60 value 683.935105
## iter  70 value 663.293662
## iter  80 value 631.164507
## iter  90 value 624.397273
## iter 100 value 620.890018
## final  value 620.890018 
## stopped after 100 iterations
## # weights:  311
## initial  value 1710.109736 
## iter  10 value 1038.475707
## iter  20 value 778.910827
## iter  30 value 678.811259
## iter  40 value 629.944188
## iter  50 value 598.431899
## iter  60 value 563.893864
## iter  70 value 544.262450
## iter  80 value 525.779758
## iter  90 value 516.864951
## iter 100 value 514.938261
## final  value 514.938261 
## stopped after 100 iterations
## # weights:  63
## initial  value 2188.412145 
## iter  10 value 1179.159014
## iter  20 value 1056.231719
## iter  30 value 1015.312607
## iter  40 value 995.377267
## iter  50 value 972.623063
## iter  60 value 946.877293
## iter  70 value 942.583399
## iter  80 value 942.083170
## iter  90 value 942.052801
## final  value 942.052663 
## converged
## # weights:  187
## initial  value 2962.841094 
## iter  10 value 1188.306320
## iter  20 value 1048.687582
## iter  30 value 983.733110
## iter  40 value 939.119145
## iter  50 value 906.163356
## iter  60 value 865.594378
## iter  70 value 795.501623
## iter  80 value 742.679837
## iter  90 value 717.460057
## iter 100 value 702.154091
## final  value 702.154091 
## stopped after 100 iterations
## # weights:  311
## initial  value 1903.859440 
## iter  10 value 1201.068135
## iter  20 value 994.900314
## iter  30 value 785.709917
## iter  40 value 661.773806
## iter  50 value 582.322656
## iter  60 value 520.947364
## iter  70 value 480.914175
## iter  80 value 450.441759
## iter  90 value 424.081155
## iter 100 value 411.430683
## final  value 411.430683 
## stopped after 100 iterations
## # weights:  63
## initial  value 1644.247385 
## iter  10 value 1065.422642
## iter  20 value 984.227520
## iter  30 value 948.645783
## iter  40 value 934.116433
## iter  50 value 897.358297
## iter  60 value 891.940006
## iter  70 value 891.501528
## iter  80 value 891.062852
## iter  90 value 890.754341
## iter 100 value 890.407591
## final  value 890.407591 
## stopped after 100 iterations
## # weights:  187
## initial  value 2889.395777 
## iter  10 value 1248.223702
## iter  20 value 1028.016897
## iter  30 value 855.425179
## iter  40 value 771.653416
## iter  50 value 709.959898
## iter  60 value 681.156095
## iter  70 value 667.636251
## iter  80 value 652.936926
## iter  90 value 649.209314
## iter 100 value 647.296569
## final  value 647.296569 
## stopped after 100 iterations
## # weights:  311
## initial  value 2007.323765 
## iter  10 value 1081.401311
## iter  20 value 889.570320
## iter  30 value 750.215137
## iter  40 value 663.587914
## iter  50 value 542.075651
## iter  60 value 506.600950
## iter  70 value 470.889743
## iter  80 value 451.541646
## iter  90 value 446.457062
## iter 100 value 443.159197
## final  value 443.159197 
## stopped after 100 iterations
## # weights:  63
## initial  value 2669.633915 
## iter  10 value 1045.323717
## iter  20 value 923.632749
## iter  30 value 892.474108
## iter  40 value 878.561741
## iter  50 value 835.027982
## iter  60 value 826.716789
## iter  70 value 825.696074
## iter  80 value 825.687568
## iter  90 value 825.685398
## iter 100 value 825.684601
## final  value 825.684601 
## stopped after 100 iterations
## # weights:  187
## initial  value 2815.882043 
## iter  10 value 1176.152588
## iter  20 value 882.019176
## iter  30 value 746.250284
## iter  40 value 687.577929
## iter  50 value 652.812089
## iter  60 value 629.834592
## iter  70 value 616.447642
## iter  80 value 608.836698
## iter  90 value 604.857015
## iter 100 value 600.691164
## final  value 600.691164 
## stopped after 100 iterations
## # weights:  311
## initial  value 1571.333614 
## iter  10 value 800.053442
## iter  20 value 519.102289
## iter  30 value 402.679306
## iter  40 value 352.120420
## iter  50 value 327.057807
## iter  60 value 317.273318
## iter  70 value 307.800591
## iter  80 value 298.094076
## iter  90 value 286.967660
## iter 100 value 280.821191
## final  value 280.821191 
## stopped after 100 iterations
## # weights:  63
## initial  value 2715.212372 
## iter  10 value 1114.582972
## iter  20 value 1040.168483
## iter  30 value 997.350947
## iter  40 value 985.566079
## iter  50 value 966.077319
## iter  60 value 944.951782
## iter  70 value 931.426650
## iter  80 value 921.152355
## iter  90 value 919.322052
## iter 100 value 918.357583
## final  value 918.357583 
## stopped after 100 iterations
## # weights:  187
## initial  value 2286.441983 
## iter  10 value 1132.100554
## iter  20 value 961.138236
## iter  30 value 852.289455
## iter  40 value 788.076761
## iter  50 value 753.251892
## iter  60 value 735.848920
## iter  70 value 725.201568
## iter  80 value 713.594920
## iter  90 value 707.271733
## iter 100 value 700.203783
## final  value 700.203783 
## stopped after 100 iterations
## # weights:  311
## initial  value 3504.045679 
## iter  10 value 1198.766422
## iter  20 value 1099.290279
## iter  30 value 1028.348773
## iter  40 value 928.783657
## iter  50 value 851.462584
## iter  60 value 802.758515
## iter  70 value 753.981723
## iter  80 value 685.074970
## iter  90 value 627.925047
## iter 100 value 584.325961
## final  value 584.325961 
## stopped after 100 iterations
## # weights:  63
## initial  value 3265.673978 
## iter  10 value 1259.795185
## iter  20 value 1088.956383
## iter  30 value 1014.041697
## iter  40 value 1003.558400
## iter  50 value 995.588441
## iter  60 value 976.457422
## iter  70 value 975.485314
## iter  80 value 975.343658
## iter  90 value 974.258670
## iter 100 value 974.109378
## final  value 974.109378 
## stopped after 100 iterations
## # weights:  187
## initial  value 2619.794975 
## iter  10 value 1221.637379
## iter  20 value 979.160501
## iter  30 value 913.417035
## iter  40 value 868.359481
## iter  50 value 846.574499
## iter  60 value 834.544394
## iter  70 value 814.175544
## iter  80 value 790.132414
## iter  90 value 767.355435
## iter 100 value 754.970314
## final  value 754.970314 
## stopped after 100 iterations
## # weights:  311
## initial  value 2817.140649 
## iter  10 value 1206.302894
## iter  20 value 957.983688
## iter  30 value 801.380772
## iter  40 value 694.243740
## iter  50 value 611.981187
## iter  60 value 584.518724
## iter  70 value 566.404641
## iter  80 value 558.687054
## iter  90 value 550.974524
## iter 100 value 546.313643
## final  value 546.313643 
## stopped after 100 iterations
## # weights:  63
## initial  value 2256.258533 
## iter  10 value 1138.314027
## iter  20 value 1009.103714
## iter  30 value 964.653495
## iter  40 value 950.135756
## iter  50 value 925.212168
## iter  60 value 908.565327
## iter  70 value 908.405691
## final  value 908.404535 
## converged
## # weights:  187
## initial  value 2794.339136 
## iter  10 value 1195.252929
## iter  20 value 1023.684041
## iter  30 value 894.993467
## iter  40 value 829.922099
## iter  50 value 706.634354
## iter  60 value 659.898031
## iter  70 value 634.868862
## iter  80 value 621.504113
## iter  90 value 607.768872
## iter 100 value 603.387585
## final  value 603.387585 
## stopped after 100 iterations
## # weights:  311
## initial  value 2414.145256 
## iter  10 value 1303.496581
## iter  20 value 1081.547603
## iter  30 value 863.012286
## iter  40 value 749.818098
## iter  50 value 670.294460
## iter  60 value 635.159761
## iter  70 value 584.556633
## iter  80 value 552.463227
## iter  90 value 540.607628
## iter 100 value 524.353628
## final  value 524.353628 
## stopped after 100 iterations
## # weights:  63
## initial  value 1738.332492 
## iter  10 value 1119.108214
## iter  20 value 1046.390241
## iter  30 value 1028.917658
## iter  40 value 1024.067670
## iter  50 value 1016.454530
## iter  60 value 1005.737150
## iter  70 value 990.837794
## iter  80 value 973.773701
## iter  90 value 968.366015
## iter 100 value 967.517727
## final  value 967.517727 
## stopped after 100 iterations
## # weights:  187
## initial  value 1681.884134 
## iter  10 value 1103.436497
## iter  20 value 954.844302
## iter  30 value 874.188818
## iter  40 value 825.383704
## iter  50 value 788.378377
## iter  60 value 759.290097
## iter  70 value 741.845593
## iter  80 value 727.865045
## iter  90 value 712.469503
## iter 100 value 698.915944
## final  value 698.915944 
## stopped after 100 iterations
## # weights:  311
## initial  value 1516.048856 
## iter  10 value 1078.778557
## iter  20 value 915.435438
## iter  30 value 811.406885
## iter  40 value 738.360427
## iter  50 value 657.707929
## iter  60 value 599.403861
## iter  70 value 560.685921
## iter  80 value 536.768255
## iter  90 value 519.482567
## iter 100 value 493.875003
## final  value 493.875003 
## stopped after 100 iterations
## # weights:  63
## initial  value 1631.259362 
## iter  10 value 1203.623554
## iter  20 value 1092.637151
## iter  30 value 1018.096620
## iter  40 value 987.787624
## iter  50 value 970.585858
## iter  60 value 946.394673
## iter  70 value 943.714234
## iter  80 value 942.684535
## iter  90 value 942.406230
## iter 100 value 942.212290
## final  value 942.212290 
## stopped after 100 iterations
## # weights:  187
## initial  value 2562.907885 
## iter  10 value 1197.533563
## iter  20 value 990.846230
## iter  30 value 853.964122
## iter  40 value 751.102885
## iter  50 value 703.801095
## iter  60 value 694.832619
## iter  70 value 682.728953
## iter  80 value 674.581628
## iter  90 value 669.265970
## iter 100 value 663.732508
## final  value 663.732508 
## stopped after 100 iterations
## # weights:  311
## initial  value 2297.098826 
## iter  10 value 1188.099722
## iter  20 value 950.479508
## iter  30 value 808.978632
## iter  40 value 690.380825
## iter  50 value 558.334184
## iter  60 value 489.615883
## iter  70 value 466.342653
## iter  80 value 447.020837
## iter  90 value 425.604549
## iter 100 value 405.806609
## final  value 405.806609 
## stopped after 100 iterations
## # weights:  63
## initial  value 2287.917914 
## iter  10 value 1178.719838
## iter  20 value 1073.146366
## iter  30 value 993.616250
## iter  40 value 972.063405
## iter  50 value 936.426010
## iter  60 value 921.754434
## iter  70 value 921.119928
## iter  80 value 921.114799
## iter  90 value 921.111429
## iter 100 value 921.110364
## final  value 921.110364 
## stopped after 100 iterations
## # weights:  187
## initial  value 1887.728477 
## iter  10 value 1237.106993
## iter  20 value 999.661506
## iter  30 value 872.108630
## iter  40 value 810.343667
## iter  50 value 753.876401
## iter  60 value 692.016370
## iter  70 value 658.902503
## iter  80 value 627.180328
## iter  90 value 614.073796
## iter 100 value 613.382145
## final  value 613.382145 
## stopped after 100 iterations
## # weights:  311
## initial  value 2434.449624 
## iter  10 value 1015.948514
## iter  20 value 783.324441
## iter  30 value 562.311706
## iter  40 value 449.959829
## iter  50 value 387.675896
## iter  60 value 362.793409
## iter  70 value 349.239752
## iter  80 value 335.752152
## iter  90 value 330.453891
## iter 100 value 325.552687
## final  value 325.552687 
## stopped after 100 iterations
## # weights:  63
## initial  value 1888.058432 
## iter  10 value 1139.940377
## iter  20 value 1029.971943
## iter  30 value 983.701615
## iter  40 value 954.523171
## iter  50 value 943.812827
## iter  60 value 934.130974
## iter  70 value 922.152124
## iter  80 value 915.961388
## iter  90 value 909.823296
## iter 100 value 906.910416
## final  value 906.910416 
## stopped after 100 iterations
## # weights:  187
## initial  value 1805.570588 
## iter  10 value 1137.918169
## iter  20 value 966.117604
## iter  30 value 898.514382
## iter  40 value 861.092523
## iter  50 value 839.098452
## iter  60 value 823.658051
## iter  70 value 798.888592
## iter  80 value 780.591663
## iter  90 value 753.453113
## iter 100 value 728.185696
## final  value 728.185696 
## stopped after 100 iterations
## # weights:  311
## initial  value 1670.443885 
## iter  10 value 959.729442
## iter  20 value 811.543399
## iter  30 value 724.903963
## iter  40 value 635.784206
## iter  50 value 561.932768
## iter  60 value 512.568920
## iter  70 value 466.427525
## iter  80 value 423.966590
## iter  90 value 394.396412
## iter 100 value 370.012023
## final  value 370.012023 
## stopped after 100 iterations
## # weights:  63
## initial  value 1896.041165 
## iter  10 value 1043.461141
## iter  20 value 937.801641
## iter  30 value 899.971883
## iter  40 value 868.480121
## iter  50 value 836.000883
## iter  60 value 835.221918
## iter  70 value 835.004831
## iter  80 value 834.846375
## iter  90 value 834.787057
## iter 100 value 834.733044
## final  value 834.733044 
## stopped after 100 iterations
## # weights:  187
## initial  value 3192.012153 
## iter  10 value 1265.383827
## iter  20 value 997.727910
## iter  30 value 898.312784
## iter  40 value 841.775139
## iter  50 value 785.210573
## iter  60 value 750.958995
## iter  70 value 722.263326
## iter  80 value 706.551232
## iter  90 value 695.919635
## iter 100 value 692.794023
## final  value 692.794023 
## stopped after 100 iterations
## # weights:  311
## initial  value 2915.001705 
## iter  10 value 1352.565298
## iter  20 value 1276.115737
## iter  30 value 1194.543170
## iter  40 value 1134.245261
## iter  50 value 1097.420713
## iter  60 value 1075.086438
## iter  70 value 1064.851014
## iter  80 value 1055.110211
## iter  90 value 1047.172517
## iter 100 value 1042.003636
## final  value 1042.003636 
## stopped after 100 iterations
## # weights:  63
## initial  value 1989.502397 
## iter  10 value 1019.058451
## iter  20 value 945.330671
## iter  30 value 919.274365
## iter  40 value 902.179691
## iter  50 value 865.583387
## iter  60 value 860.563828
## iter  70 value 860.545898
## final  value 860.545868 
## converged
## # weights:  187
## initial  value 2510.683905 
## iter  10 value 1098.037498
## iter  20 value 908.356899
## iter  30 value 786.630147
## iter  40 value 711.282161
## iter  50 value 681.934961
## iter  60 value 666.797909
## iter  70 value 646.545725
## iter  80 value 634.737250
## iter  90 value 632.747959
## iter 100 value 632.505221
## final  value 632.505221 
## stopped after 100 iterations
## # weights:  311
## initial  value 2389.256006 
## iter  10 value 1185.504119
## iter  20 value 972.839475
## iter  30 value 766.641324
## iter  40 value 616.982575
## iter  50 value 546.622123
## iter  60 value 497.541776
## iter  70 value 470.860499
## iter  80 value 459.274461
## iter  90 value 452.669063
## iter 100 value 443.910889
## final  value 443.910889 
## stopped after 100 iterations
## # weights:  63
## initial  value 1582.234335 
## iter  10 value 1080.011461
## iter  20 value 1000.168055
## iter  30 value 978.085965
## iter  40 value 969.536176
## iter  50 value 965.140623
## iter  60 value 964.149353
## iter  70 value 963.628205
## iter  80 value 963.495677
## iter  90 value 963.477192
## final  value 963.476839 
## converged
## # weights:  187
## initial  value 1697.294632 
## iter  10 value 1085.409383
## iter  20 value 906.017386
## iter  30 value 824.306679
## iter  40 value 787.106452
## iter  50 value 766.573760
## iter  60 value 755.931829
## iter  70 value 744.875585
## iter  80 value 732.073425
## iter  90 value 706.861507
## iter 100 value 693.416272
## final  value 693.416272 
## stopped after 100 iterations
## # weights:  311
## initial  value 2234.170833 
## iter  10 value 1462.414987
## iter  20 value 1206.317929
## iter  30 value 1051.959629
## iter  40 value 871.097542
## iter  50 value 723.961864
## iter  60 value 628.722400
## iter  70 value 561.957055
## iter  80 value 520.274689
## iter  90 value 494.146833
## iter 100 value 457.939826
## final  value 457.939826 
## stopped after 100 iterations
## # weights:  63
## initial  value 2048.260765 
## iter  10 value 1079.023703
## iter  20 value 991.426774
## iter  30 value 969.472445
## iter  40 value 959.453713
## iter  50 value 951.607458
## iter  60 value 947.969431
## iter  70 value 943.501857
## iter  80 value 942.702520
## iter  90 value 942.447400
## iter 100 value 942.412007
## final  value 942.412007 
## stopped after 100 iterations
## # weights:  187
## initial  value 1780.458259 
## iter  10 value 989.896025
## iter  20 value 863.339947
## iter  30 value 706.175348
## iter  40 value 654.019374
## iter  50 value 615.250717
## iter  60 value 586.149398
## iter  70 value 569.172765
## iter  80 value 559.435690
## iter  90 value 553.066094
## iter 100 value 550.722991
## final  value 550.722991 
## stopped after 100 iterations
## # weights:  311
## initial  value 3262.668760 
## iter  10 value 1271.586476
## iter  20 value 1010.854542
## iter  30 value 873.455442
## iter  40 value 811.689370
## iter  50 value 776.511201
## iter  60 value 743.863993
## iter  70 value 723.683714
## iter  80 value 702.525008
## iter  90 value 684.080752
## iter 100 value 668.862849
## final  value 668.862849 
## stopped after 100 iterations
## # weights:  63
## initial  value 3096.204129 
## iter  10 value 1300.547593
## iter  20 value 1139.205351
## iter  30 value 1086.285003
## iter  40 value 1061.142407
## iter  50 value 1040.512142
## iter  60 value 1022.186523
## iter  70 value 1019.495429
## iter  80 value 1019.438697
## iter  90 value 1019.428370
## iter 100 value 1019.425163
## final  value 1019.425163 
## stopped after 100 iterations
## # weights:  187
## initial  value 1899.848874 
## iter  10 value 1192.865164
## iter  20 value 944.511812
## iter  30 value 861.216424
## iter  40 value 770.641085
## iter  50 value 691.832876
## iter  60 value 644.885705
## iter  70 value 622.910848
## iter  80 value 601.550477
## iter  90 value 578.469846
## iter 100 value 570.371597
## final  value 570.371597 
## stopped after 100 iterations
## # weights:  311
## initial  value 1707.933671 
## iter  10 value 1112.444627
## iter  20 value 756.532685
## iter  30 value 577.739868
## iter  40 value 492.339155
## iter  50 value 434.110334
## iter  60 value 396.922610
## iter  70 value 356.406752
## iter  80 value 322.483519
## iter  90 value 282.559356
## iter 100 value 279.579614
## final  value 279.579614 
## stopped after 100 iterations
## # weights:  63
## initial  value 3117.583563 
## iter  10 value 1309.585555
## iter  20 value 1192.821578
## iter  30 value 1098.288834
## iter  40 value 1066.827642
## iter  50 value 1046.548173
## iter  60 value 1037.584337
## iter  70 value 1032.599727
## iter  80 value 1011.284542
## iter  90 value 1003.832415
## iter 100 value 1000.177696
## final  value 1000.177696 
## stopped after 100 iterations
## # weights:  187
## initial  value 1566.242462 
## iter  10 value 1051.441317
## iter  20 value 939.239892
## iter  30 value 798.438120
## iter  40 value 729.939877
## iter  50 value 674.144821
## iter  60 value 637.809684
## iter  70 value 624.701116
## iter  80 value 617.190248
## iter  90 value 608.478813
## iter 100 value 599.529276
## final  value 599.529276 
## stopped after 100 iterations
## # weights:  311
## initial  value 1752.949728 
## iter  10 value 1121.514791
## iter  20 value 909.675072
## iter  30 value 742.724427
## iter  40 value 636.478266
## iter  50 value 565.643588
## iter  60 value 505.332411
## iter  70 value 469.397544
## iter  80 value 441.884781
## iter  90 value 422.113300
## iter 100 value 410.590636
## final  value 410.590636 
## stopped after 100 iterations
## # weights:  63
## initial  value 2889.872836 
## iter  10 value 1157.744455
## iter  20 value 1041.877384
## iter  30 value 980.543505
## iter  40 value 958.361301
## iter  50 value 931.153860
## iter  60 value 912.184447
## iter  70 value 909.477861
## iter  80 value 908.822066
## iter  90 value 908.061549
## iter 100 value 907.661284
## final  value 907.661284 
## stopped after 100 iterations
## # weights:  187
## initial  value 2319.885491 
## iter  10 value 1191.696046
## iter  20 value 1004.949880
## iter  30 value 876.594011
## iter  40 value 784.046954
## iter  50 value 698.331401
## iter  60 value 652.430099
## iter  70 value 628.585970
## iter  80 value 611.495974
## iter  90 value 602.867550
## iter 100 value 600.386799
## final  value 600.386799 
## stopped after 100 iterations
## # weights:  311
## initial  value 2056.908973 
## iter  10 value 1022.840558
## iter  20 value 709.132865
## iter  30 value 591.031826
## iter  40 value 512.061306
## iter  50 value 460.232634
## iter  60 value 438.840263
## iter  70 value 417.385573
## iter  80 value 409.607741
## iter  90 value 405.869126
## iter 100 value 402.609134
## final  value 402.609134 
## stopped after 100 iterations
## # weights:  63
## initial  value 1968.650654 
## iter  10 value 1233.958941
## iter  20 value 1104.812973
## iter  30 value 1069.027583
## iter  40 value 1041.719817
## iter  50 value 1017.454836
## iter  60 value 987.377545
## iter  70 value 981.536005
## iter  80 value 981.504832
## final  value 981.504674 
## converged
## # weights:  187
## initial  value 1545.952213 
## iter  10 value 1019.569596
## iter  20 value 852.079496
## iter  30 value 738.004909
## iter  40 value 701.723750
## iter  50 value 676.788668
## iter  60 value 659.549556
## iter  70 value 630.253015
## iter  80 value 622.560707
## iter  90 value 622.249917
## iter 100 value 622.059324
## final  value 622.059324 
## stopped after 100 iterations
## # weights:  311
## initial  value 1671.403967 
## iter  10 value 995.804342
## iter  20 value 752.563482
## iter  30 value 625.282865
## iter  40 value 562.681128
## iter  50 value 524.765423
## iter  60 value 492.641494
## iter  70 value 461.851689
## iter  80 value 449.211067
## iter  90 value 441.697439
## iter 100 value 437.740061
## final  value 437.740061 
## stopped after 100 iterations
## # weights:  63
## initial  value 3079.906075 
## iter  10 value 1333.629846
## iter  20 value 1181.119888
## iter  30 value 1056.936047
## iter  40 value 1023.043504
## iter  50 value 999.149610
## iter  60 value 994.870152
## iter  70 value 992.400607
## iter  80 value 988.709330
## iter  90 value 986.690908
## iter 100 value 986.616902
## final  value 986.616902 
## stopped after 100 iterations
## # weights:  187
## initial  value 2847.399015 
## iter  10 value 1339.194203
## iter  20 value 1207.083566
## iter  30 value 1074.942130
## iter  40 value 966.023509
## iter  50 value 904.837456
## iter  60 value 863.351824
## iter  70 value 813.476122
## iter  80 value 780.284781
## iter  90 value 745.970730
## iter 100 value 733.638362
## final  value 733.638362 
## stopped after 100 iterations
## # weights:  311
## initial  value 1494.503220 
## iter  10 value 1045.642008
## iter  20 value 784.032792
## iter  30 value 616.912745
## iter  40 value 549.065481
## iter  50 value 502.804771
## iter  60 value 464.510908
## iter  70 value 430.746635
## iter  80 value 404.261990
## iter  90 value 384.644972
## iter 100 value 361.948859
## final  value 361.948859 
## stopped after 100 iterations
## # weights:  63
## initial  value 2066.642411 
## iter  10 value 1171.548932
## iter  20 value 1069.328312
## iter  30 value 1033.841355
## iter  40 value 1004.830963
## iter  50 value 983.278058
## iter  60 value 958.525263
## iter  70 value 956.395451
## iter  80 value 955.205351
## iter  90 value 954.289743
## iter 100 value 953.657047
## final  value 953.657047 
## stopped after 100 iterations
## # weights:  187
## initial  value 3895.449146 
## iter  10 value 1358.717275
## iter  20 value 1115.546544
## iter  30 value 951.491095
## iter  40 value 830.214277
## iter  50 value 741.474888
## iter  60 value 697.102253
## iter  70 value 665.692488
## iter  80 value 649.173169
## iter  90 value 636.756169
## iter 100 value 627.242266
## final  value 627.242266 
## stopped after 100 iterations
## # weights:  311
## initial  value 2406.993020 
## iter  10 value 1051.599065
## iter  20 value 799.771303
## iter  30 value 647.237784
## iter  40 value 549.144452
## iter  50 value 465.809609
## iter  60 value 424.824975
## iter  70 value 400.164478
## iter  80 value 385.391076
## iter  90 value 377.850331
## iter 100 value 369.286765
## final  value 369.286765 
## stopped after 100 iterations
## # weights:  311
## initial  value 3955.553178 
## iter  10 value 1314.508491
## iter  20 value 1188.522661
## iter  30 value 1095.376334
## iter  40 value 1008.613244
## iter  50 value 934.788501
## iter  60 value 851.783121
## iter  70 value 808.574226
## iter  80 value 763.308422
## iter  90 value 721.393968
## iter 100 value 698.621884
## final  value 698.621884 
## stopped after 100 iterations
dNN_test$score = predict(nn_mod, newdata=dNN_test)

perf_met(dNN_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed   1020   76
##     Left       89  137
##                                           
##                Accuracy : 0.8752          
##                  95% CI : (0.8562, 0.8925)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : 0.0001244       
##                                           
##                   Kappa : 0.5494          
##                                           
##  Mcnemar's Test P-Value : 0.3502014       
##                                           
##             Sensitivity : 0.6432          
##             Specificity : 0.9197          
##          Pos Pred Value : 0.6062          
##          Neg Pred Value : 0.9307          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1036          
##    Detection Prevalence : 0.1710          
##       Balanced Accuracy : 0.7815          
##                                           
##        'Positive' Class : Left            
## 

feature_imp(nn_mod$finalModel)


Insight

  • This baseline neural network model performed similarly to the Logistic Regression, with high accuracy and specificity but low sensitivity. The samples need to be weighted to correct for the imbalance in the next step.
  • Variable importance values are dramatically different from the previous two models. Also, the model seems to favor numeric features even though they were scaled. This needs to be investigated further.

2. Model Tuning and Cross Validation
  • Samples will be weighted to correct label feature imbalance.
  • We will execute the following code to optimize hyper parameters size and decay.
  • Cross validation will be used to obtain regularization.
weights = ifelse(dNN_train$Attrition == 'Left', 0.84, 0.16)

fitControl <- trainControl(method = "repeatedcv",
                           number = 5,
                           repeats = 3,
                           returnResamp="all",
                           savePredictions = TRUE,
                           classProbs = TRUE,
                           summaryFunction = twoClassSummary)
paramGrid <- expand.grid(size = c(3, 6, 9, 12, 15), decay = c(1.0, 0.5, 0.1))

set.seed(1234)
nn_mod_1<- train(Attrition ~ 
                         Age + 
                         BusinessTravel + 
                         Department + 
                         DistanceFromHome + 
                         Education +
                         EducationField + 
                         Gender + 
                         JobLevel + 
                         JobRole + 
                         MaritalStatus + 
                         MonthlyIncome + 
                         NumCompaniesWorked + 
                         PercentSalaryHike + 
                         StockOptionLevel + 
                         TotalWorkingYears +
                         TrainingTimesLastYear + 
                         YearsAtCompany + 
                         YearsSinceLastPromotion +
                         YearsWithCurrManager + 
                         EnvironmentSatisfaction + 
                         JobSatisfaction +
                         WorkLifeBalance +
                         JobInvolvement + 
                         PerformanceRating + 
                         AvgHrs,
                          data = dNN_train,  
                          method = "nnet", # Neural network model 
                          trControl = fitControl, 
                          tuneGrid = paramGrid, 
                          weights = weights, 
                          trace = FALSE,
                          metric="Sens")

plot(nn_mod_1)

varImp(nn_mod_1)
## nnet variable importance
## 
##   only 20 most important variables shown (out of 60)
## 
##                                  Overall
## GenderMale                        100.00
## AvgHrs                             95.51
## TrainingTimesLastYear              78.01
## YearsWithCurrManager               74.30
## JobRoleLaboratory Technician       69.33
## DistanceFromHome                   68.67
## StockOptionLevel.C                 56.44
## BusinessTravel.Q                   54.63
## JobLevel^4                         51.38
## Education^4                        49.12
## NumCompaniesWorked                 48.65
## WorkLifeBalance^4                  47.91
## DepartmentResearch & Development   47.04
## YearsAtCompany                     46.96
## JobRoleSales Representative        46.74
## JobRoleResearch Scientist          46.72
## JobInvolvement.Q                   46.23
## JobSatisfaction.C                  45.84
## TotalWorkingYears                  44.48
## EnvironmentSatisfaction.Q          43.04
dNN_test$score = predict(nn_mod_1, newdata=dNN_test)

perf_met(dNN_test)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stayed Left
##     Stayed   1084    5
##     Left       25  208
##                                           
##                Accuracy : 0.9773          
##                  95% CI : (0.9678, 0.9846)
##     No Information Rate : 0.8389          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9191          
##                                           
##  Mcnemar's Test P-Value : 0.0005226       
##                                           
##             Sensitivity : 0.9765          
##             Specificity : 0.9775          
##          Pos Pred Value : 0.8927          
##          Neg Pred Value : 0.9954          
##              Prevalence : 0.1611          
##          Detection Rate : 0.1573          
##    Detection Prevalence : 0.1762          
##       Balanced Accuracy : 0.9770          
##                                           
##        'Positive' Class : Left            
## 

feature_imp(nn_mod_1$finalModel)

Conclusion

The best performing model utilized the Random Forest algorithm with mtry=11 and ntree=501.

Key driving features were: * AvgHrs * YearsAtCompany * TotalWorkngYears * MaritalStatus * Age

Of these features, the only feature which the company can affect for the current employees is the average work hours. Employees who work longer hours tend to leave the company at a higher rate. Therefore, limiting or encouraging work hours to be regular 8 hour shifts may increase employee retention rate.